From volker.simonis at gmail.com Thu Jan 2 09:22:59 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Thu, 2 Jan 2014 18:22:59 +0100 Subject: RFR(S): JDK-8031134 : PPC64: implement printing on AIX Message-ID: Hi, could somebody please review the following small change: http://cr.openjdk.java.net/~simonis/webrevs/8031134/ It's the straight forward implementation of the basic printing infrastructure on AIX and shouldn't have any impact on the existing platforms. As always, this change is intended for the http://hg.openjdk.java.net/ppc-aix-port/stage/jdk repository. Thank you and best regards, Volker From Alan.Bateman at oracle.com Thu Jan 2 12:15:31 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Thu, 02 Jan 2014 20:15:31 +0000 Subject: RFR(S): JDK-8031134 : PPC64: implement printing on AIX In-Reply-To: References: Message-ID: <52C5C8E3.8020503@oracle.com> On 02/01/2014 17:22, Volker Simonis wrote: > Hi, > > could somebody please review the following small change: > > http://cr.openjdk.java.net/~simonis/webrevs/8031134/ > > It's the straight forward implementation of the basic printing > infrastructure on AIX and shouldn't have any impact on the existing > platforms. As always, this change is intended for the > http://hg.openjdk.java.net/ppc-aix-port/stage/jdk repository. > cc'ing 2d-dev as that is where the printing code is maintained. Your changes suggest that this code should probably be refactored at some point to make it easier to add other Unix variants. -Alan From vladimir.kozlov at oracle.com Mon Jan 6 10:04:47 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 06 Jan 2014 10:04:47 -0800 Subject: RFR(XS): 8031188: Fix for 8029015: PPC64 (part 216): opto: trap based null and range checks In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE8A4D2@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE8A4D2@DEWDFEMB12A.global.corp.sap> Message-ID: <52CAF03F.6050503@oracle.com> Looks good. Vladimir On 1/6/14 2:14 AM, Lindenmaier, Goetz wrote: > A happy new year everybody! > > Could you please review and test this tiny fix? > > It?s in code so far only used by ppc64. > > http://cr.openjdk.java.net/~goetz/webrevs/8031188-nc/ > > fixup_flow can add new blocks with a branch instruction. Doing > so after a trap based check added the branch behind the wrong > Proj node. The branch would jump to the proper target, but build_oop_map > analyses a wrong control flow. > > Fix: Swap the Projs in the block list so that the new block is added behind > the proper node. > > Thanks & best regards, > > Goetz > From goetz.lindenmaier at sap.com Mon Jan 6 02:14:50 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 6 Jan 2014 10:14:50 +0000 Subject: RFR(XS): 8031188: Fix for 8029015: PPC64 (part 216): opto: trap based null and range checks Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8A4D2@DEWDFEMB12A.global.corp.sap> A happy new year everybody! Could you please review and test this tiny fix? It's in code so far only used by ppc64. http://cr.openjdk.java.net/~goetz/webrevs/8031188-nc/ fixup_flow can add new blocks with a branch instruction. Doing so after a trap based check added the branch behind the wrong Proj node. The branch would jump to the proper target, but build_oop_map analyses a wrong control flow. Fix: Swap the Projs in the block list so that the new block is added behind the proper node. Thanks & best regards, Goetz -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140106/ff9bcf56/attachment.html From goetz.lindenmaier at sap.com Mon Jan 6 02:16:50 2014 From: goetz.lindenmaier at sap.com (goetz.lindenmaier at sap.com) Date: Mon, 06 Jan 2014 10:16:50 +0000 Subject: hg: ppc-aix-port/jdk7u/hotspot: ppc: Fix issue in trap based null check optimization Message-ID: <20140106101706.A15CD623DE@hg.openjdk.java.net> Changeset: 3cc52fb61873 Author: goetz Date: 2014-01-06 10:50 +0100 URL: http://hg.openjdk.java.net/ppc-aix-port/jdk7u/hotspot/rev/3cc52fb61873 ppc: Fix issue in trap based null check optimization ! src/share/vm/opto/block.cpp ! src/share/vm/opto/block.hpp From vladimir.kozlov at oracle.com Mon Jan 6 16:45:42 2014 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Tue, 07 Jan 2014 00:45:42 +0000 Subject: hg: ppc-aix-port/stage/hotspot: 8031188: Fix for 8029015: PPC64 (part 216): opto: trap based null and range checks Message-ID: <20140107004546.D9E94623FD@hg.openjdk.java.net> Changeset: 4345c6a92f35 Author: goetz Date: 2014-01-06 11:02 +0100 URL: http://hg.openjdk.java.net/ppc-aix-port/stage/hotspot/rev/4345c6a92f35 8031188: Fix for 8029015: PPC64 (part 216): opto: trap based null and range checks Summary: Swap the Projs in the block list so that the new block is added behind the proper node. Reviewed-by: kvn ! src/share/vm/opto/block.cpp From david.holmes at oracle.com Mon Jan 6 22:33:18 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 07 Jan 2014 16:33:18 +1000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293F087.2080700@oracle.com> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> Message-ID: <52CB9FAE.80800@oracle.com> Hi Goetz, Happy New year! :) I'm backing up a step now that I have a better grip on the IRIW issue. On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: > Hi, > > ok, I understand the tests are wrong. It's good this issue is settled. > Thanks Aleksey and Andreas for going into the details of the proof! > > About our change: David, the causality is the other way round. > The change is about IRIW. > 1. To pass IRIW, we must use sync instructions before loads. Okay I see this (though I think we need to kill IRIW as a requirement and I don't think IRIW should form part of any conformance test). > 2. If we do syncs before loads, we don't need to do them after stores. True for the IRIW case but to prevent local reordering of volatile stores don't you still need some form of "storestore" barrier? initially: x = 0; y = 0 Thread 0: Thread 1: x = 1 r1 = x y = 1 r2 = y --------------------------- Should be Forbidden: r1 = 0 ^ r2 = 1 You can add the sync after each read but that doesn't prevent the stores in thread 0 from being locally re-ordered. So I think you still need at least lwsync on the writer side: initially: x = 0; y = 0 Thread 0: Thread 1: x = 1 r1 = x lwsync sync y = 1 r2 = y sync --------------------------- Forbidden: r1 = 0 ^ r2 = 1 David ----- > 3. If we don't do them after stores, we fail the volatile constructor tests. > 4. So finally we added them again at the end of the constructor after stores > to pass the volatile constructor tests. > > We originally passed the constructor tests because the ppc memory order > instructions are not as find-granular as the > operations in the IR. MemBarVolatile is specified as StoreLoad. The only instruction > on PPC that does StoreLoad is sync. But sync also does StoreStore, therefore the > MemBarVolatile after the store fixes the constructor tests. The proper representation > of the fix in the IR would be adding a MemBarStoreStore. But now it's pointless > anyways. > >> I'm not happy with the ifdef approach but I won't block it. > I'd be happy to add a property > OrderAccess::cpu_is_multiple_copy_atomic() > or the like to guard the customization. I'd like that much better. Or also > OrderAccess::needs_support_iriw_ordering() > VM_Version::needs_support_iriw_ordering() > > > Best regards, > Goetz. > > > > > > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Donnerstag, 28. November 2013 00:34 > To: Lindenmaier, Goetz > Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > TL;DR version: > > Discussion on the c-i list has now confirmed that a constructor-barrier > for volatiles is not required as part of the JMM specification. It *may* > be required in an implementation that doesn't pre-zero memory to ensure > you can't see uninitialized fields. So the tests for this are invalid > and this part of the patch is not needed in general (ppc64 may need it > due to other factors). > > Re: "multiple copy atomicity" - first thanks for correcting the term :) > Second thanks for the reference to that paper! For reference: > > "The memory system (perhaps involving a hierarchy of buffers and a > complex interconnect) does not guarantee that a write becomes visible to > all other hardware threads at the same time point; these architectures > are not multiple-copy atomic." > > This is the visibility issue that I referred to and affects both ARM and > PPC. But of course it is normally handled by using suitable barriers > after the stores that need to be visible. I think the crux of the > current issue is what you wrote below: > > > The fixes for the constructor issue are only needed because we > > remove the sync instruction from behind stores (parse3.cpp:320) > > and place it before loads. > > I hadn't grasped this part. Obviously if you fail to do the sync after > the store then you have to do something around the loads to get the same > results! I still don't know what lead you to the conclusion that the > only way to fix the IRIW issue was to put the fence before the load - > maybe when I get the chance to read that paper in full it will be clearer. > > So ... the basic problem is that the current structure in the VM has > hard-wired one choice of how to get the right semantics for volatile > variables. You now want to customize that but not all the requisite > hooks are present. It would be better if volatile_load and > volatile_store were factored out so that they could be implemented as > desired per-platform. Alternatively there could be pre- and post- hooks > that could then be customized per platform. Otherwise you need > platform-specific ifdef's to handle it as per your patch. > > I'm not happy with the ifdef approach but I won't block it. I think this > is an area where a lot of clean up is needed in the VM. The barrier > abstractions are a confused mess in my opinion. > > Thanks, > David > ----- > > On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >> Hi, >> >> I updated the webrev to fix the issues mentioned by Vladimir: >> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >> >> I did not yet add the >> OrderAccess::needs_support_iriw_ordering() >> VM_Version::needs_support_iriw_ordering() >> or >> OrderAccess::cpu_is_multiple_copy_atomic() >> to reduce #defined, as I got no further comment on that. >> >> >> WRT to the validity of the tests and the interpretation of the JMM >> I feel not in the position to contribute substantially. >> >> But we would like to pass the torture test suite as we consider >> this a substantial task in implementing a PPC port. Also we think >> both tests show behavior a programmer would expect. It's bad if >> Java code runs fine on the more common x86 platform, and then >> fails on ppc. This will always first be blamed on the VM. >> >> The fixes for the constructor issue are only needed because we >> remove the sync instruction from behind stores (parse3.cpp:320) >> and place it before loads. Then there is no sync between volatile store >> and publishing the object. So we add it again in this one case >> (volatile store in constructor). >> >> >> @David >>>> Sure. There also is no solution as you require for the taskqueue problem yet, >>>> and that's being discussed now for almost a year. >>> It may have started a year ago but work on it has hardly been continuous. >> That's not true, we did a lot of investigation and testing on this issue. >> And we came up with a solution we consider the best possible. If you >> have objections, you should at least give the draft of a better solution, >> we would volunteer to implement and test it. >> Similarly, we invested time in fixing the concurrency torture issues. >> >> @David >>> What is "multiple-read-atomicity"? I'm not familiar with the term and >>> can't find any reference to it. >> We learned about this reading "A Tutorial Introduction to the ARM and >> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >> Peter Sewell, which is cited in "Correct and Efficient Work-Stealing for >> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >> and Francesco Zappa Nardelli (PPoPP `13) when analysing the taskqueue problem. >> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >> >> I was wrong in one thing, it's called multiple copy atomicity, I used 'read' >> instead. Sorry for that. (I also fixed that in the method name above). >> >> Best regards and thanks for all your involvements, >> Goetz. >> >> >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Mittwoch, 27. November 2013 12:53 >> To: Lindenmaier, Goetz >> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> Hi Goetz, >> >> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>> Hi David, >>> >>> -- Volatile in constuctor >>>> AFAIK we have not seen those tests fail due to a >>>> missing constructor barrier. >>> We see them on PPC64. Our test machines have typically 8-32 processors >>> and are Power 5-7. But see also Aleksey's mail. (Thanks Aleksey!) >> >> And see follow ups - the tests are invalid. >> >>> -- IRIW issue >>>> I can not possibly answer to the necessary level of detail with a few >>>> moments thought. >>> Sure. There also is no solution as you require for the taskqueue problem yet, >>> and that's being discussed now for almost a year. >> >> It may have started a year ago but work on it has hardly been continuous. >> >>>> You are implying there is a problem here that will >>>> impact numerous platforms (unless you can tell me why ppc is so different?) >>> No, only PPC does not have 'multiple-read-atomicity'. Therefore I contributed a >>> solution with the #defines, and that's correct for all, but not nice, I admit. >>> (I don't really know about ARM, though). >>> So if I can write down a nicer solution testing for methods that are evaluated >>> by the C-compiler I'm happy. >>> >>> The problem is not that IRIW is not handled by the JMM, the problem >>> is that >>> store >>> sync >>> does not assure multiple-read-atomicity, >>> only >>> sync >>> load >>> does so on PPC. And you require multiple-read-atomicity to >>> pass that test. >> >> What is "multiple-read-atomicity"? I'm not familiar with the term and >> can't find any reference to it. >> >> Thanks, >> David >> >> The JMM is fine. And >>> store >>> MemBarVolatile >>> is fine on x86, sparc etc. as there exist assembler instructions that >>> do what is required. >>> >>> So if you are off soon, please let's come to a solution that >>> might be improvable in the way it's implemented, but that >>> allows us to implement a correct PPC64 port. >>> >>> Best regards, >>> Goetz. >>> >>> >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Tuesday, November 26, 2013 1:11 PM >>> To: Lindenmaier, Goetz >>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>> >>> Hi Goetz, >>> >>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>> Hi everybody, >>>> >>>> thanks a lot for the detailed reviews! >>>> I'll try to answer to all in one mail. >>>> >>>>> Volatile fields written in constructor aren't guaranteed by JMM to occur before the reference is assigned; >>>> We don't think it's correct if we omit the barrier after initializing >>>> a volatile field. Previously, we discussed this with Aleksey Shipilev >>>> and Doug Lea, and they agreed. >>>> Also, concurrency torture tests >>>> LongVolatileTest >>>> AtomicIntegerInitialValueTest >>>> will fail. >>>> (In addition, observing 0 instead of the inital value of a volatile field would be >>>> very counter-intuitive for Java programmers, especially in AtomicInteger.) >>> >>> The affects of unsafe publication are always surprising - volatiles do >>> not add anything special here. AFAIK there is nothing in the JMM that >>> requires the constructor barrier - discussions with Doug and Aleksey >>> notwithstanding. AFAIK we have not seen those tests fail due to a >>> missing constructor barrier. >>> >>>>> proposed for PPC64 is to make volatile reads extremely heavyweight >>>> Yes, it costs measurable performance. But else it is wrong. We don't >>>> see a way to implement this cheaper. >>>> >>>>> - these algorithms should be expressed using the correct OrderAccess operations >>>> Basically, I agree on this. But you also have to take into account >>>> that due to the different memory ordering instructions on different platforms >>>> just implementing something empty is not sufficient. >>>> An example: >>>> MemBarRelease // means LoadStore, StoreStore barrier >>>> MemBarVolatile // means StoreLoad barrier >>>> If these are consecutively in the code, sparc code looks like this: >>>> MemBarRelease --> membar(Assembler::LoadStore | Assembler::StoreStore) >>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>> Just doing what is required. >>>> On Power, we get suboptimal code, as there are no comparable, >>>> fine grained operations: >>>> MemBarRelease --> lwsync // Doing LoadStore, StoreStore, LoadLoad >>>> MemBarVolatile --> sync // // Doing LoadStore, StoreStore, LoadLoad, StoreLoad >>>> obviously, the lwsync is superfluous. Thus, as PPC operations are more (too) powerful, >>>> I need an additional optimization that removes the lwsync. I can not implement >>>> MemBarRelease empty, as it is also used independently. >>>> >>>> Back to the IRIW problem. I think here we have a comparable issue. >>>> Doing the MemBarVolatile or the OrderAccess::fence() before the read >>>> is inefficient on platforms that have multiple-read-atomicity. >>>> >>>> I would propose to guard the code by >>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>> OrderAccess::cpu_is_multiple_read_atomic() >>>> Else, David, how would you propose to implement this platform independent? >>>> (Maybe we can also use above method in taskqueue.hpp.) >>> >>> I can not possibly answer to the necessary level of detail with a few >>> moments thought. You are implying there is a problem here that will >>> impact numerous platforms (unless you can tell me why ppc is so >>> different?) and I can not take that on face value at the moment. The >>> only reason I can see IRIW not being handled by the JMM requirements for >>> volatile accesses is if there are global visibility issues that are not >>> addressed - but even then I would expect heavy barriers at the store >>> would deal with that, not at the load. (This situation reminds me of the >>> need for read-barriers on Alpha architecture due to the use of software >>> cache-coherency rather than hardware cache-coherency - but we don't have >>> that on ppc!) >>> >>> Sorry - There is no quick resolution here and in a couple of days I will >>> be heading out on vacation for two weeks. >>> >>> David >>> ----- >>> >>>> Best regards, >>>> Goetz. >>>> >>>> -- Other ports: >>>> The IRIW issue requires at least 3 processors to be relevant, so it might >>>> not happen on small machines. But I can use PPC_ONLY instead >>>> of PPC64_ONLY if you request so (and if we don't get rid of them). >>>> >>>> -- MemBarStoreStore after initialization >>>> I agree we should not change it in the ppc port. If you wish, I can >>>> prepare an extra webrev for hotspot-comp. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>> To: Vladimir Kozlov >>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>> >>>> Okay this is my second attempt at answering this in a reasonable way :) >>>> >>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>> I have to ask David to do correctness evaluation. >>>> >>>> From what I understand what we see here is an attempt to fix an >>>> existing issue with the implementation of volatiles so that the IRIW >>>> problem is addressed. The solution proposed for PPC64 is to make >>>> volatile reads extremely heavyweight by adding a fence() when doing the >>>> load. >>>> >>>> Now if this was purely handled in ppc64 source code then I would be >>>> happy to let them do whatever they like (surely this kills performance >>>> though!). But I do not agree with the changes to the shared code that >>>> allow this solution to be implemented - even with PPC64_ONLY this is >>>> polluting the shared code. My concern is similar to what I said with the >>>> taskQueue changes - these algorithms should be expressed using the >>>> correct OrderAccess operations to guarantee the desired properties >>>> independent of architecture. If such a "barrier" is not needed on a >>>> given architecture then the implementation in OrderAccess should reduce >>>> to a no-op. >>>> >>>> And as Vitaly points out the constructor barriers are not needed under >>>> the JMM. >>>> >>>>> I am fine with suggested changes because you did not change our current >>>>> code for our platforms (please, do not change do_exits() now). >>>>> But may be it should be done using more general query which is set >>>>> depending on platform: >>>>> >>>>> OrderAccess::needs_support_iriw_ordering() >>>>> >>>>> or similar to what we use now: >>>>> >>>>> VM_Version::needs_support_iriw_ordering() >>>> >>>> Every platform has to support IRIW this is simply part of the Java >>>> Memory Model, there should not be any need to call this out explicitly >>>> like this. >>>> >>>> Is there some subtlety of the hardware I am missing here? Are there >>>> visibility issues beyond the ordering constraints that the JMM defines? >>>>> From what I understand our ppc port is also affected. David? >>>> >>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>> >>>> David >>>> ----- >>>> >>>>> In library_call.cpp can you add {}? New comment should be inside else {}. >>>>> >>>>> I think you should make _wrote_volatile field not ppc64 specific which >>>>> will be set to 'true' only on ppc64. Then you will not need PPC64_ONLY() >>>>> except in do_put_xxx() where it is set to true. Too many #ifdefs. >>>>> >>>>> In do_put_xxx() can you combine your changes: >>>>> >>>>> if (is_vol) { >>>>> // See comment in do_get_xxx(). >>>>> #ifndef PPC64 >>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>> #else >>>>> if (is_field) { >>>>> // Add MemBarRelease for constructors which write volatile field >>>>> (PPC64). >>>>> set_wrote_volatile(true); >>>>> } >>>>> #endif >>>>> } >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>> Hi, >>>>>> >>>>>> I preprared a webrev with fixes for PPC for the VolatileIRIWTest of >>>>>> the torture test suite: >>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>> >>>>>> Example: >>>>>> volatile x=0, y=0 >>>>>> __________ __________ __________ __________ >>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>> >>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>> read(y) read(x) >>>>>> >>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>> >>>>>> >>>>>> Solution: This example requires multiple-copy-atomicity. This is only >>>>>> assured by the sync instruction and if it is executed in the threads >>>>>> doing the loads. Thus we implement volatile read as sync-load-acquire >>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>> MemBarVolatile happens to be implemented by sync. >>>>>> We fix this in C2 and the cpp interpreter. >>>>>> >>>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs >>>>>> fails on staxf" for taskqueue.hpp. >>>>>> >>>>>> Further this change contains a fix that assures that volatile fields >>>>>> written in constructors are visible before the reference gets >>>>>> published. >>>>>> >>>>>> >>>>>> Looking at the code, we found a MemBarRelease that to us, seems too >>>>>> strong. >>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should suffice. >>>>>> What do you think? >>>>>> >>>>>> Please review and test this change. >>>>>> >>>>>> Best regards, >>>>>> Goetz. >>>>>> From goetz.lindenmaier at sap.com Tue Jan 7 01:10:09 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 7 Jan 2014 09:10:09 +0000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <52CB9FAE.80800@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293F087.2080700@oracle.com> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52CB9FAE.80800@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8A82A@DEWDFEMB12A.global.corp.sap> Hi David, happy new year for you too! The compiler uses the following operations for volatiles: MemBarRelease --> lwsync Store Load MemBarVolatile --> sync MemBarAcquire --> lwsync With our change we get: MemBarRelease --> lwsync MemBarVolatile --> sync Store Load MemBarAcquire --> lwsync So the lwsync in your example is added by the MemBarRelease before the Store. Best regards, Goetz. -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Dienstag, 7. Januar 2014 07:33 To: Lindenmaier, Goetz Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes Hi Goetz, Happy New year! :) I'm backing up a step now that I have a better grip on the IRIW issue. On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: > Hi, > > ok, I understand the tests are wrong. It's good this issue is settled. > Thanks Aleksey and Andreas for going into the details of the proof! > > About our change: David, the causality is the other way round. > The change is about IRIW. > 1. To pass IRIW, we must use sync instructions before loads. Okay I see this (though I think we need to kill IRIW as a requirement and I don't think IRIW should form part of any conformance test). > 2. If we do syncs before loads, we don't need to do them after stores. True for the IRIW case but to prevent local reordering of volatile stores don't you still need some form of "storestore" barrier? initially: x = 0; y = 0 Thread 0: Thread 1: x = 1 r1 = x y = 1 r2 = y --------------------------- Should be Forbidden: r1 = 0 ^ r2 = 1 You can add the sync after each read but that doesn't prevent the stores in thread 0 from being locally re-ordered. So I think you still need at least lwsync on the writer side: initially: x = 0; y = 0 Thread 0: Thread 1: x = 1 r1 = x lwsync sync y = 1 r2 = y sync --------------------------- Forbidden: r1 = 0 ^ r2 = 1 David ----- > 3. If we don't do them after stores, we fail the volatile constructor tests. > 4. So finally we added them again at the end of the constructor after stores > to pass the volatile constructor tests. > > We originally passed the constructor tests because the ppc memory order > instructions are not as find-granular as the > operations in the IR. MemBarVolatile is specified as StoreLoad. The only instruction > on PPC that does StoreLoad is sync. But sync also does StoreStore, therefore the > MemBarVolatile after the store fixes the constructor tests. The proper representation > of the fix in the IR would be adding a MemBarStoreStore. But now it's pointless > anyways. > >> I'm not happy with the ifdef approach but I won't block it. > I'd be happy to add a property > OrderAccess::cpu_is_multiple_copy_atomic() > or the like to guard the customization. I'd like that much better. Or also > OrderAccess::needs_support_iriw_ordering() > VM_Version::needs_support_iriw_ordering() > > > Best regards, > Goetz. > > > > > > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Donnerstag, 28. November 2013 00:34 > To: Lindenmaier, Goetz > Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > TL;DR version: > > Discussion on the c-i list has now confirmed that a constructor-barrier > for volatiles is not required as part of the JMM specification. It *may* > be required in an implementation that doesn't pre-zero memory to ensure > you can't see uninitialized fields. So the tests for this are invalid > and this part of the patch is not needed in general (ppc64 may need it > due to other factors). > > Re: "multiple copy atomicity" - first thanks for correcting the term :) > Second thanks for the reference to that paper! For reference: > > "The memory system (perhaps involving a hierarchy of buffers and a > complex interconnect) does not guarantee that a write becomes visible to > all other hardware threads at the same time point; these architectures > are not multiple-copy atomic." > > This is the visibility issue that I referred to and affects both ARM and > PPC. But of course it is normally handled by using suitable barriers > after the stores that need to be visible. I think the crux of the > current issue is what you wrote below: > > > The fixes for the constructor issue are only needed because we > > remove the sync instruction from behind stores (parse3.cpp:320) > > and place it before loads. > > I hadn't grasped this part. Obviously if you fail to do the sync after > the store then you have to do something around the loads to get the same > results! I still don't know what lead you to the conclusion that the > only way to fix the IRIW issue was to put the fence before the load - > maybe when I get the chance to read that paper in full it will be clearer. > > So ... the basic problem is that the current structure in the VM has > hard-wired one choice of how to get the right semantics for volatile > variables. You now want to customize that but not all the requisite > hooks are present. It would be better if volatile_load and > volatile_store were factored out so that they could be implemented as > desired per-platform. Alternatively there could be pre- and post- hooks > that could then be customized per platform. Otherwise you need > platform-specific ifdef's to handle it as per your patch. > > I'm not happy with the ifdef approach but I won't block it. I think this > is an area where a lot of clean up is needed in the VM. The barrier > abstractions are a confused mess in my opinion. > > Thanks, > David > ----- > > On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >> Hi, >> >> I updated the webrev to fix the issues mentioned by Vladimir: >> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >> >> I did not yet add the >> OrderAccess::needs_support_iriw_ordering() >> VM_Version::needs_support_iriw_ordering() >> or >> OrderAccess::cpu_is_multiple_copy_atomic() >> to reduce #defined, as I got no further comment on that. >> >> >> WRT to the validity of the tests and the interpretation of the JMM >> I feel not in the position to contribute substantially. >> >> But we would like to pass the torture test suite as we consider >> this a substantial task in implementing a PPC port. Also we think >> both tests show behavior a programmer would expect. It's bad if >> Java code runs fine on the more common x86 platform, and then >> fails on ppc. This will always first be blamed on the VM. >> >> The fixes for the constructor issue are only needed because we >> remove the sync instruction from behind stores (parse3.cpp:320) >> and place it before loads. Then there is no sync between volatile store >> and publishing the object. So we add it again in this one case >> (volatile store in constructor). >> >> >> @David >>>> Sure. There also is no solution as you require for the taskqueue problem yet, >>>> and that's being discussed now for almost a year. >>> It may have started a year ago but work on it has hardly been continuous. >> That's not true, we did a lot of investigation and testing on this issue. >> And we came up with a solution we consider the best possible. If you >> have objections, you should at least give the draft of a better solution, >> we would volunteer to implement and test it. >> Similarly, we invested time in fixing the concurrency torture issues. >> >> @David >>> What is "multiple-read-atomicity"? I'm not familiar with the term and >>> can't find any reference to it. >> We learned about this reading "A Tutorial Introduction to the ARM and >> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >> Peter Sewell, which is cited in "Correct and Efficient Work-Stealing for >> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >> and Francesco Zappa Nardelli (PPoPP `13) when analysing the taskqueue problem. >> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >> >> I was wrong in one thing, it's called multiple copy atomicity, I used 'read' >> instead. Sorry for that. (I also fixed that in the method name above). >> >> Best regards and thanks for all your involvements, >> Goetz. >> >> >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Mittwoch, 27. November 2013 12:53 >> To: Lindenmaier, Goetz >> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> Hi Goetz, >> >> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>> Hi David, >>> >>> -- Volatile in constuctor >>>> AFAIK we have not seen those tests fail due to a >>>> missing constructor barrier. >>> We see them on PPC64. Our test machines have typically 8-32 processors >>> and are Power 5-7. But see also Aleksey's mail. (Thanks Aleksey!) >> >> And see follow ups - the tests are invalid. >> >>> -- IRIW issue >>>> I can not possibly answer to the necessary level of detail with a few >>>> moments thought. >>> Sure. There also is no solution as you require for the taskqueue problem yet, >>> and that's being discussed now for almost a year. >> >> It may have started a year ago but work on it has hardly been continuous. >> >>>> You are implying there is a problem here that will >>>> impact numerous platforms (unless you can tell me why ppc is so different?) >>> No, only PPC does not have 'multiple-read-atomicity'. Therefore I contributed a >>> solution with the #defines, and that's correct for all, but not nice, I admit. >>> (I don't really know about ARM, though). >>> So if I can write down a nicer solution testing for methods that are evaluated >>> by the C-compiler I'm happy. >>> >>> The problem is not that IRIW is not handled by the JMM, the problem >>> is that >>> store >>> sync >>> does not assure multiple-read-atomicity, >>> only >>> sync >>> load >>> does so on PPC. And you require multiple-read-atomicity to >>> pass that test. >> >> What is "multiple-read-atomicity"? I'm not familiar with the term and >> can't find any reference to it. >> >> Thanks, >> David >> >> The JMM is fine. And >>> store >>> MemBarVolatile >>> is fine on x86, sparc etc. as there exist assembler instructions that >>> do what is required. >>> >>> So if you are off soon, please let's come to a solution that >>> might be improvable in the way it's implemented, but that >>> allows us to implement a correct PPC64 port. >>> >>> Best regards, >>> Goetz. >>> >>> >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Tuesday, November 26, 2013 1:11 PM >>> To: Lindenmaier, Goetz >>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>> >>> Hi Goetz, >>> >>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>> Hi everybody, >>>> >>>> thanks a lot for the detailed reviews! >>>> I'll try to answer to all in one mail. >>>> >>>>> Volatile fields written in constructor aren't guaranteed by JMM to occur before the reference is assigned; >>>> We don't think it's correct if we omit the barrier after initializing >>>> a volatile field. Previously, we discussed this with Aleksey Shipilev >>>> and Doug Lea, and they agreed. >>>> Also, concurrency torture tests >>>> LongVolatileTest >>>> AtomicIntegerInitialValueTest >>>> will fail. >>>> (In addition, observing 0 instead of the inital value of a volatile field would be >>>> very counter-intuitive for Java programmers, especially in AtomicInteger.) >>> >>> The affects of unsafe publication are always surprising - volatiles do >>> not add anything special here. AFAIK there is nothing in the JMM that >>> requires the constructor barrier - discussions with Doug and Aleksey >>> notwithstanding. AFAIK we have not seen those tests fail due to a >>> missing constructor barrier. >>> >>>>> proposed for PPC64 is to make volatile reads extremely heavyweight >>>> Yes, it costs measurable performance. But else it is wrong. We don't >>>> see a way to implement this cheaper. >>>> >>>>> - these algorithms should be expressed using the correct OrderAccess operations >>>> Basically, I agree on this. But you also have to take into account >>>> that due to the different memory ordering instructions on different platforms >>>> just implementing something empty is not sufficient. >>>> An example: >>>> MemBarRelease // means LoadStore, StoreStore barrier >>>> MemBarVolatile // means StoreLoad barrier >>>> If these are consecutively in the code, sparc code looks like this: >>>> MemBarRelease --> membar(Assembler::LoadStore | Assembler::StoreStore) >>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>> Just doing what is required. >>>> On Power, we get suboptimal code, as there are no comparable, >>>> fine grained operations: >>>> MemBarRelease --> lwsync // Doing LoadStore, StoreStore, LoadLoad >>>> MemBarVolatile --> sync // // Doing LoadStore, StoreStore, LoadLoad, StoreLoad >>>> obviously, the lwsync is superfluous. Thus, as PPC operations are more (too) powerful, >>>> I need an additional optimization that removes the lwsync. I can not implement >>>> MemBarRelease empty, as it is also used independently. >>>> >>>> Back to the IRIW problem. I think here we have a comparable issue. >>>> Doing the MemBarVolatile or the OrderAccess::fence() before the read >>>> is inefficient on platforms that have multiple-read-atomicity. >>>> >>>> I would propose to guard the code by >>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>> OrderAccess::cpu_is_multiple_read_atomic() >>>> Else, David, how would you propose to implement this platform independent? >>>> (Maybe we can also use above method in taskqueue.hpp.) >>> >>> I can not possibly answer to the necessary level of detail with a few >>> moments thought. You are implying there is a problem here that will >>> impact numerous platforms (unless you can tell me why ppc is so >>> different?) and I can not take that on face value at the moment. The >>> only reason I can see IRIW not being handled by the JMM requirements for >>> volatile accesses is if there are global visibility issues that are not >>> addressed - but even then I would expect heavy barriers at the store >>> would deal with that, not at the load. (This situation reminds me of the >>> need for read-barriers on Alpha architecture due to the use of software >>> cache-coherency rather than hardware cache-coherency - but we don't have >>> that on ppc!) >>> >>> Sorry - There is no quick resolution here and in a couple of days I will >>> be heading out on vacation for two weeks. >>> >>> David >>> ----- >>> >>>> Best regards, >>>> Goetz. >>>> >>>> -- Other ports: >>>> The IRIW issue requires at least 3 processors to be relevant, so it might >>>> not happen on small machines. But I can use PPC_ONLY instead >>>> of PPC64_ONLY if you request so (and if we don't get rid of them). >>>> >>>> -- MemBarStoreStore after initialization >>>> I agree we should not change it in the ppc port. If you wish, I can >>>> prepare an extra webrev for hotspot-comp. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>> To: Vladimir Kozlov >>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>> >>>> Okay this is my second attempt at answering this in a reasonable way :) >>>> >>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>> I have to ask David to do correctness evaluation. >>>> >>>> From what I understand what we see here is an attempt to fix an >>>> existing issue with the implementation of volatiles so that the IRIW >>>> problem is addressed. The solution proposed for PPC64 is to make >>>> volatile reads extremely heavyweight by adding a fence() when doing the >>>> load. >>>> >>>> Now if this was purely handled in ppc64 source code then I would be >>>> happy to let them do whatever they like (surely this kills performance >>>> though!). But I do not agree with the changes to the shared code that >>>> allow this solution to be implemented - even with PPC64_ONLY this is >>>> polluting the shared code. My concern is similar to what I said with the >>>> taskQueue changes - these algorithms should be expressed using the >>>> correct OrderAccess operations to guarantee the desired properties >>>> independent of architecture. If such a "barrier" is not needed on a >>>> given architecture then the implementation in OrderAccess should reduce >>>> to a no-op. >>>> >>>> And as Vitaly points out the constructor barriers are not needed under >>>> the JMM. >>>> >>>>> I am fine with suggested changes because you did not change our current >>>>> code for our platforms (please, do not change do_exits() now). >>>>> But may be it should be done using more general query which is set >>>>> depending on platform: >>>>> >>>>> OrderAccess::needs_support_iriw_ordering() >>>>> >>>>> or similar to what we use now: >>>>> >>>>> VM_Version::needs_support_iriw_ordering() >>>> >>>> Every platform has to support IRIW this is simply part of the Java >>>> Memory Model, there should not be any need to call this out explicitly >>>> like this. >>>> >>>> Is there some subtlety of the hardware I am missing here? Are there >>>> visibility issues beyond the ordering constraints that the JMM defines? >>>>> From what I understand our ppc port is also affected. David? >>>> >>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>> >>>> David >>>> ----- >>>> >>>>> In library_call.cpp can you add {}? New comment should be inside else {}. >>>>> >>>>> I think you should make _wrote_volatile field not ppc64 specific which >>>>> will be set to 'true' only on ppc64. Then you will not need PPC64_ONLY() >>>>> except in do_put_xxx() where it is set to true. Too many #ifdefs. >>>>> >>>>> In do_put_xxx() can you combine your changes: >>>>> >>>>> if (is_vol) { >>>>> // See comment in do_get_xxx(). >>>>> #ifndef PPC64 >>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>> #else >>>>> if (is_field) { >>>>> // Add MemBarRelease for constructors which write volatile field >>>>> (PPC64). >>>>> set_wrote_volatile(true); >>>>> } >>>>> #endif >>>>> } >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>> Hi, >>>>>> >>>>>> I preprared a webrev with fixes for PPC for the VolatileIRIWTest of >>>>>> the torture test suite: >>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>> >>>>>> Example: >>>>>> volatile x=0, y=0 >>>>>> __________ __________ __________ __________ >>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>> >>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>> read(y) read(x) >>>>>> >>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>> >>>>>> >>>>>> Solution: This example requires multiple-copy-atomicity. This is only >>>>>> assured by the sync instruction and if it is executed in the threads >>>>>> doing the loads. Thus we implement volatile read as sync-load-acquire >>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>> MemBarVolatile happens to be implemented by sync. >>>>>> We fix this in C2 and the cpp interpreter. >>>>>> >>>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs >>>>>> fails on staxf" for taskqueue.hpp. >>>>>> >>>>>> Further this change contains a fix that assures that volatile fields >>>>>> written in constructors are visible before the reference gets >>>>>> published. >>>>>> >>>>>> >>>>>> Looking at the code, we found a MemBarRelease that to us, seems too >>>>>> strong. >>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should suffice. >>>>>> What do you think? >>>>>> >>>>>> Please review and test this change. >>>>>> >>>>>> Best regards, >>>>>> Goetz. >>>>>> From david.holmes at oracle.com Tue Jan 7 01:22:06 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 07 Jan 2014 19:22:06 +1000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE8A82A@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293F087.2080700@oracle.com> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52CB9FAE.80800@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8A82A@DEWDFEMB12A.global.corp.sap> Message-ID: <52CBC73E.8060501@oracle.com> On 7/01/2014 7:10 PM, Lindenmaier, Goetz wrote: > Hi David, > > happy new year for you too! > > The compiler uses the following operations for volatiles: > > MemBarRelease --> lwsync > Store Load > MemBarVolatile --> sync MemBarAcquire --> lwsync > > With our change we get: > > MemBarRelease --> lwsync MemBarVolatile --> sync > Store Load > MemBarAcquire --> lwsync > > So the lwsync in your example is added by the MemBarRelease before the > Store. Ah I see. Thanks for clarifying. I need to mull on this a little more. David > Best regards, > Goetz. > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Dienstag, 7. Januar 2014 07:33 > To: Lindenmaier, Goetz > Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > Hi Goetz, > > Happy New year! :) > > I'm backing up a step now that I have a better grip on the IRIW issue. > > On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >> Hi, >> >> ok, I understand the tests are wrong. It's good this issue is settled. >> Thanks Aleksey and Andreas for going into the details of the proof! >> >> About our change: David, the causality is the other way round. >> The change is about IRIW. >> 1. To pass IRIW, we must use sync instructions before loads. > > Okay I see this (though I think we need to kill IRIW as a requirement > and I don't think IRIW should form part of any conformance test). > >> 2. If we do syncs before loads, we don't need to do them after stores. > > True for the IRIW case but to prevent local reordering of volatile > stores don't you still need some form of "storestore" barrier? > > initially: x = 0; y = 0 > Thread 0: Thread 1: > x = 1 r1 = x > y = 1 r2 = y > --------------------------- > Should be Forbidden: r1 = 0 ^ r2 = 1 > > You can add the sync after each read but that doesn't prevent the stores > in thread 0 from being locally re-ordered. So I think you still need at > least lwsync on the writer side: > > initially: x = 0; y = 0 > Thread 0: Thread 1: > x = 1 r1 = x > lwsync sync > y = 1 r2 = y > sync > --------------------------- > Forbidden: r1 = 0 ^ r2 = 1 > > David > ----- > >> 3. If we don't do them after stores, we fail the volatile constructor tests. >> 4. So finally we added them again at the end of the constructor after stores >> to pass the volatile constructor tests. >> >> We originally passed the constructor tests because the ppc memory order >> instructions are not as find-granular as the >> operations in the IR. MemBarVolatile is specified as StoreLoad. The only instruction >> on PPC that does StoreLoad is sync. But sync also does StoreStore, therefore the >> MemBarVolatile after the store fixes the constructor tests. The proper representation >> of the fix in the IR would be adding a MemBarStoreStore. But now it's pointless >> anyways. >> >>> I'm not happy with the ifdef approach but I won't block it. >> I'd be happy to add a property >> OrderAccess::cpu_is_multiple_copy_atomic() >> or the like to guard the customization. I'd like that much better. Or also >> OrderAccess::needs_support_iriw_ordering() >> VM_Version::needs_support_iriw_ordering() >> >> >> Best regards, >> Goetz. >> >> >> >> >> >> >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Donnerstag, 28. November 2013 00:34 >> To: Lindenmaier, Goetz >> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> TL;DR version: >> >> Discussion on the c-i list has now confirmed that a constructor-barrier >> for volatiles is not required as part of the JMM specification. It *may* >> be required in an implementation that doesn't pre-zero memory to ensure >> you can't see uninitialized fields. So the tests for this are invalid >> and this part of the patch is not needed in general (ppc64 may need it >> due to other factors). >> >> Re: "multiple copy atomicity" - first thanks for correcting the term :) >> Second thanks for the reference to that paper! For reference: >> >> "The memory system (perhaps involving a hierarchy of buffers and a >> complex interconnect) does not guarantee that a write becomes visible to >> all other hardware threads at the same time point; these architectures >> are not multiple-copy atomic." >> >> This is the visibility issue that I referred to and affects both ARM and >> PPC. But of course it is normally handled by using suitable barriers >> after the stores that need to be visible. I think the crux of the >> current issue is what you wrote below: >> >> > The fixes for the constructor issue are only needed because we >> > remove the sync instruction from behind stores (parse3.cpp:320) >> > and place it before loads. >> >> I hadn't grasped this part. Obviously if you fail to do the sync after >> the store then you have to do something around the loads to get the same >> results! I still don't know what lead you to the conclusion that the >> only way to fix the IRIW issue was to put the fence before the load - >> maybe when I get the chance to read that paper in full it will be clearer. >> >> So ... the basic problem is that the current structure in the VM has >> hard-wired one choice of how to get the right semantics for volatile >> variables. You now want to customize that but not all the requisite >> hooks are present. It would be better if volatile_load and >> volatile_store were factored out so that they could be implemented as >> desired per-platform. Alternatively there could be pre- and post- hooks >> that could then be customized per platform. Otherwise you need >> platform-specific ifdef's to handle it as per your patch. >> >> I'm not happy with the ifdef approach but I won't block it. I think this >> is an area where a lot of clean up is needed in the VM. The barrier >> abstractions are a confused mess in my opinion. >> >> Thanks, >> David >> ----- >> >> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>> Hi, >>> >>> I updated the webrev to fix the issues mentioned by Vladimir: >>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>> >>> I did not yet add the >>> OrderAccess::needs_support_iriw_ordering() >>> VM_Version::needs_support_iriw_ordering() >>> or >>> OrderAccess::cpu_is_multiple_copy_atomic() >>> to reduce #defined, as I got no further comment on that. >>> >>> >>> WRT to the validity of the tests and the interpretation of the JMM >>> I feel not in the position to contribute substantially. >>> >>> But we would like to pass the torture test suite as we consider >>> this a substantial task in implementing a PPC port. Also we think >>> both tests show behavior a programmer would expect. It's bad if >>> Java code runs fine on the more common x86 platform, and then >>> fails on ppc. This will always first be blamed on the VM. >>> >>> The fixes for the constructor issue are only needed because we >>> remove the sync instruction from behind stores (parse3.cpp:320) >>> and place it before loads. Then there is no sync between volatile store >>> and publishing the object. So we add it again in this one case >>> (volatile store in constructor). >>> >>> >>> @David >>>>> Sure. There also is no solution as you require for the taskqueue problem yet, >>>>> and that's being discussed now for almost a year. >>>> It may have started a year ago but work on it has hardly been continuous. >>> That's not true, we did a lot of investigation and testing on this issue. >>> And we came up with a solution we consider the best possible. If you >>> have objections, you should at least give the draft of a better solution, >>> we would volunteer to implement and test it. >>> Similarly, we invested time in fixing the concurrency torture issues. >>> >>> @David >>>> What is "multiple-read-atomicity"? I'm not familiar with the term and >>>> can't find any reference to it. >>> We learned about this reading "A Tutorial Introduction to the ARM and >>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>> Peter Sewell, which is cited in "Correct and Efficient Work-Stealing for >>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the taskqueue problem. >>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>> >>> I was wrong in one thing, it's called multiple copy atomicity, I used 'read' >>> instead. Sorry for that. (I also fixed that in the method name above). >>> >>> Best regards and thanks for all your involvements, >>> Goetz. >>> >>> >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Mittwoch, 27. November 2013 12:53 >>> To: Lindenmaier, Goetz >>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>> >>> Hi Goetz, >>> >>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>> Hi David, >>>> >>>> -- Volatile in constuctor >>>>> AFAIK we have not seen those tests fail due to a >>>>> missing constructor barrier. >>>> We see them on PPC64. Our test machines have typically 8-32 processors >>>> and are Power 5-7. But see also Aleksey's mail. (Thanks Aleksey!) >>> >>> And see follow ups - the tests are invalid. >>> >>>> -- IRIW issue >>>>> I can not possibly answer to the necessary level of detail with a few >>>>> moments thought. >>>> Sure. There also is no solution as you require for the taskqueue problem yet, >>>> and that's being discussed now for almost a year. >>> >>> It may have started a year ago but work on it has hardly been continuous. >>> >>>>> You are implying there is a problem here that will >>>>> impact numerous platforms (unless you can tell me why ppc is so different?) >>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore I contributed a >>>> solution with the #defines, and that's correct for all, but not nice, I admit. >>>> (I don't really know about ARM, though). >>>> So if I can write down a nicer solution testing for methods that are evaluated >>>> by the C-compiler I'm happy. >>>> >>>> The problem is not that IRIW is not handled by the JMM, the problem >>>> is that >>>> store >>>> sync >>>> does not assure multiple-read-atomicity, >>>> only >>>> sync >>>> load >>>> does so on PPC. And you require multiple-read-atomicity to >>>> pass that test. >>> >>> What is "multiple-read-atomicity"? I'm not familiar with the term and >>> can't find any reference to it. >>> >>> Thanks, >>> David >>> >>> The JMM is fine. And >>>> store >>>> MemBarVolatile >>>> is fine on x86, sparc etc. as there exist assembler instructions that >>>> do what is required. >>>> >>>> So if you are off soon, please let's come to a solution that >>>> might be improvable in the way it's implemented, but that >>>> allows us to implement a correct PPC64 port. >>>> >>>> Best regards, >>>> Goetz. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>> To: Lindenmaier, Goetz >>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>> >>>> Hi Goetz, >>>> >>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>> Hi everybody, >>>>> >>>>> thanks a lot for the detailed reviews! >>>>> I'll try to answer to all in one mail. >>>>> >>>>>> Volatile fields written in constructor aren't guaranteed by JMM to occur before the reference is assigned; >>>>> We don't think it's correct if we omit the barrier after initializing >>>>> a volatile field. Previously, we discussed this with Aleksey Shipilev >>>>> and Doug Lea, and they agreed. >>>>> Also, concurrency torture tests >>>>> LongVolatileTest >>>>> AtomicIntegerInitialValueTest >>>>> will fail. >>>>> (In addition, observing 0 instead of the inital value of a volatile field would be >>>>> very counter-intuitive for Java programmers, especially in AtomicInteger.) >>>> >>>> The affects of unsafe publication are always surprising - volatiles do >>>> not add anything special here. AFAIK there is nothing in the JMM that >>>> requires the constructor barrier - discussions with Doug and Aleksey >>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>> missing constructor barrier. >>>> >>>>>> proposed for PPC64 is to make volatile reads extremely heavyweight >>>>> Yes, it costs measurable performance. But else it is wrong. We don't >>>>> see a way to implement this cheaper. >>>>> >>>>>> - these algorithms should be expressed using the correct OrderAccess operations >>>>> Basically, I agree on this. But you also have to take into account >>>>> that due to the different memory ordering instructions on different platforms >>>>> just implementing something empty is not sufficient. >>>>> An example: >>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>> MemBarVolatile // means StoreLoad barrier >>>>> If these are consecutively in the code, sparc code looks like this: >>>>> MemBarRelease --> membar(Assembler::LoadStore | Assembler::StoreStore) >>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>> Just doing what is required. >>>>> On Power, we get suboptimal code, as there are no comparable, >>>>> fine grained operations: >>>>> MemBarRelease --> lwsync // Doing LoadStore, StoreStore, LoadLoad >>>>> MemBarVolatile --> sync // // Doing LoadStore, StoreStore, LoadLoad, StoreLoad >>>>> obviously, the lwsync is superfluous. Thus, as PPC operations are more (too) powerful, >>>>> I need an additional optimization that removes the lwsync. I can not implement >>>>> MemBarRelease empty, as it is also used independently. >>>>> >>>>> Back to the IRIW problem. I think here we have a comparable issue. >>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the read >>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>> >>>>> I would propose to guard the code by >>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>> Else, David, how would you propose to implement this platform independent? >>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>> >>>> I can not possibly answer to the necessary level of detail with a few >>>> moments thought. You are implying there is a problem here that will >>>> impact numerous platforms (unless you can tell me why ppc is so >>>> different?) and I can not take that on face value at the moment. The >>>> only reason I can see IRIW not being handled by the JMM requirements for >>>> volatile accesses is if there are global visibility issues that are not >>>> addressed - but even then I would expect heavy barriers at the store >>>> would deal with that, not at the load. (This situation reminds me of the >>>> need for read-barriers on Alpha architecture due to the use of software >>>> cache-coherency rather than hardware cache-coherency - but we don't have >>>> that on ppc!) >>>> >>>> Sorry - There is no quick resolution here and in a couple of days I will >>>> be heading out on vacation for two weeks. >>>> >>>> David >>>> ----- >>>> >>>>> Best regards, >>>>> Goetz. >>>>> >>>>> -- Other ports: >>>>> The IRIW issue requires at least 3 processors to be relevant, so it might >>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>> of PPC64_ONLY if you request so (and if we don't get rid of them). >>>>> >>>>> -- MemBarStoreStore after initialization >>>>> I agree we should not change it in the ppc port. If you wish, I can >>>>> prepare an extra webrev for hotspot-comp. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>> To: Vladimir Kozlov >>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>>> >>>>> Okay this is my second attempt at answering this in a reasonable way :) >>>>> >>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>> I have to ask David to do correctness evaluation. >>>>> >>>>> From what I understand what we see here is an attempt to fix an >>>>> existing issue with the implementation of volatiles so that the IRIW >>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>> volatile reads extremely heavyweight by adding a fence() when doing the >>>>> load. >>>>> >>>>> Now if this was purely handled in ppc64 source code then I would be >>>>> happy to let them do whatever they like (surely this kills performance >>>>> though!). But I do not agree with the changes to the shared code that >>>>> allow this solution to be implemented - even with PPC64_ONLY this is >>>>> polluting the shared code. My concern is similar to what I said with the >>>>> taskQueue changes - these algorithms should be expressed using the >>>>> correct OrderAccess operations to guarantee the desired properties >>>>> independent of architecture. If such a "barrier" is not needed on a >>>>> given architecture then the implementation in OrderAccess should reduce >>>>> to a no-op. >>>>> >>>>> And as Vitaly points out the constructor barriers are not needed under >>>>> the JMM. >>>>> >>>>>> I am fine with suggested changes because you did not change our current >>>>>> code for our platforms (please, do not change do_exits() now). >>>>>> But may be it should be done using more general query which is set >>>>>> depending on platform: >>>>>> >>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>> >>>>>> or similar to what we use now: >>>>>> >>>>>> VM_Version::needs_support_iriw_ordering() >>>>> >>>>> Every platform has to support IRIW this is simply part of the Java >>>>> Memory Model, there should not be any need to call this out explicitly >>>>> like this. >>>>> >>>>> Is there some subtlety of the hardware I am missing here? Are there >>>>> visibility issues beyond the ordering constraints that the JMM defines? >>>>>> From what I understand our ppc port is also affected. David? >>>>> >>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> In library_call.cpp can you add {}? New comment should be inside else {}. >>>>>> >>>>>> I think you should make _wrote_volatile field not ppc64 specific which >>>>>> will be set to 'true' only on ppc64. Then you will not need PPC64_ONLY() >>>>>> except in do_put_xxx() where it is set to true. Too many #ifdefs. >>>>>> >>>>>> In do_put_xxx() can you combine your changes: >>>>>> >>>>>> if (is_vol) { >>>>>> // See comment in do_get_xxx(). >>>>>> #ifndef PPC64 >>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>> #else >>>>>> if (is_field) { >>>>>> // Add MemBarRelease for constructors which write volatile field >>>>>> (PPC64). >>>>>> set_wrote_volatile(true); >>>>>> } >>>>>> #endif >>>>>> } >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I preprared a webrev with fixes for PPC for the VolatileIRIWTest of >>>>>>> the torture test suite: >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>> >>>>>>> Example: >>>>>>> volatile x=0, y=0 >>>>>>> __________ __________ __________ __________ >>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>> >>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>> read(y) read(x) >>>>>>> >>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>> >>>>>>> >>>>>>> Solution: This example requires multiple-copy-atomicity. This is only >>>>>>> assured by the sync instruction and if it is executed in the threads >>>>>>> doing the loads. Thus we implement volatile read as sync-load-acquire >>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>> >>>>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs >>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>> >>>>>>> Further this change contains a fix that assures that volatile fields >>>>>>> written in constructors are visible before the reference gets >>>>>>> published. >>>>>>> >>>>>>> >>>>>>> Looking at the code, we found a MemBarRelease that to us, seems too >>>>>>> strong. >>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should suffice. >>>>>>> What do you think? >>>>>>> >>>>>>> Please review and test this change. >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz. >>>>>>> From goetz.lindenmaier at sap.com Tue Jan 7 08:49:23 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 7 Jan 2014 16:49:23 +0000 Subject: RFR (M): 8031319: PPC64: Some fixes in ppc and aix coding. Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8AAEE@DEWDFEMB12A.global.corp.sap> Hi, This change contains a row of fixes in the ppc and aix coding we collected over the last testing days. Please review this change. http://cr.openjdk.java.net/~goetz/webrevs/8031319-ppcFix/ In more detail: Stack overflow issue: code branched to zero. Fix: Generate throw_StackOverflowError stub before generating the native entry. Fix assertion when looking for nmethod in code cache: use unsafe accessor. Also optimize: don't look up the address twice. Fix compressing klass pointers. Implement some missing stuff in the aix os branch. Thanks and best regards, Goetz. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140107/d685cb63/attachment-0001.html From volker.simonis at gmail.com Tue Jan 7 09:27:37 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 7 Jan 2014 18:27:37 +0100 Subject: RFR (M): 8031319: PPC64: Some fixes in ppc and aix coding. In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE8AAEE@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE8AAEE@DEWDFEMB12A.global.corp.sap> Message-ID: Hi, we've forgotten a SAP JVM comment in the code in os_aix.cpp in the method: bool os::find(address addr, outputStream* st) Sorry, that was my fault. It will be removed in the final version. Regards, Volker On Tue, Jan 7, 2014 at 5:49 PM, Lindenmaier, Goetz wrote: > Hi, > > > > This change contains a row of fixes in the ppc and aix coding > > we collected over the last testing days. > > Please review this change. > > http://cr.openjdk.java.net/~goetz/webrevs/8031319-ppcFix/ > > > > In more detail: > > > Stack overflow issue: code branched to zero. > Fix: Generate throw_StackOverflowError stub before generating > the native entry. > > Fix assertion when looking for nmethod in code cache: > use unsafe accessor. Also optimize: don't look up the > address twice. > > Fix compressing klass pointers. > > Implement some missing stuff in the aix os branch. > > > > Thanks and best regards, > > Goetz. From vladimir.kozlov at oracle.com Tue Jan 7 09:56:31 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 07 Jan 2014 09:56:31 -0800 Subject: RFR (M): 8031319: PPC64: Some fixes in ppc and aix coding. In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE8AAEE@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE8AAEE@DEWDFEMB12A.global.corp.sap> Message-ID: <52CC3FCF.3070608@oracle.com> Looks fine to me. You can push it since it does not affect shared code. Thanks, Vladimir On 1/7/14 8:49 AM, Lindenmaier, Goetz wrote: > Hi, > > This change contains a row of fixes in the ppc and aix coding > > we collected over the last testing days. > > Please review this change. > > http://cr.openjdk.java.net/~goetz/webrevs/8031319-ppcFix/ > > In more detail: > > > Stack overflow issue: code branched to zero. > Fix: Generate throw_StackOverflowError stub before generating > the native entry. > > Fix assertion when looking for nmethod in code cache: > use unsafe accessor. Also optimize: don't look up the > address twice. > > Fix compressing klass pointers. > > Implement some missing stuff in the aix os branch. > > Thanks and best regards, > > Goetz. > From goetz.lindenmaier at sap.com Tue Jan 7 11:21:23 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 7 Jan 2014 19:21:23 +0000 Subject: RFR (M): 8031319: PPC64: Some fixes in ppc and aix coding. In-Reply-To: <52CC3FCF.3070608@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE8AAEE@DEWDFEMB12A.global.corp.sap> <52CC3FCF.3070608@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8AB45@DEWDFEMB12A.global.corp.sap> Hi Vladimir, thanks for the review, and for testing the other change yesterday. I pushed it, including the fix to the comment. Best regards, Goetz. -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, January 07, 2014 6:57 PM To: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' Subject: Re: RFR (M): 8031319: PPC64: Some fixes in ppc and aix coding. Looks fine to me. You can push it since it does not affect shared code. Thanks, Vladimir On 1/7/14 8:49 AM, Lindenmaier, Goetz wrote: > Hi, > > This change contains a row of fixes in the ppc and aix coding > > we collected over the last testing days. > > Please review this change. > > http://cr.openjdk.java.net/~goetz/webrevs/8031319-ppcFix/ > > In more detail: > > > Stack overflow issue: code branched to zero. > Fix: Generate throw_StackOverflowError stub before generating > the native entry. > > Fix assertion when looking for nmethod in code cache: > use unsafe accessor. Also optimize: don't look up the > address twice. > > Fix compressing klass pointers. > > Implement some missing stuff in the aix os branch. > > Thanks and best regards, > > Goetz. > From goetz.lindenmaier at sap.com Tue Jan 7 11:16:27 2014 From: goetz.lindenmaier at sap.com (goetz.lindenmaier at sap.com) Date: Tue, 07 Jan 2014 19:16:27 +0000 Subject: hg: ppc-aix-port/stage/hotspot: 8031319: PPC64: Some fixes in ppc and aix coding. Message-ID: <20140107191630.D632B62435@hg.openjdk.java.net> Changeset: c668f307a4c0 Author: goetz Date: 2014-01-07 17:24 +0100 URL: http://hg.openjdk.java.net/ppc-aix-port/stage/hotspot/rev/c668f307a4c0 8031319: PPC64: Some fixes in ppc and aix coding. Reviewed-by: kvn ! src/cpu/ppc/vm/cppInterpreter_ppc.cpp ! src/cpu/ppc/vm/macroAssembler_ppc.cpp ! src/cpu/ppc/vm/nativeInst_ppc.cpp ! src/cpu/ppc/vm/nativeInst_ppc.hpp ! src/cpu/ppc/vm/ppc.ad ! src/cpu/ppc/vm/stubGenerator_ppc.cpp ! src/os/aix/vm/os_aix.cpp ! src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp From goetz.lindenmaier at sap.com Wed Jan 8 06:16:29 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 8 Jan 2014 14:16:29 +0000 Subject: Build of new stage-9 repository Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8AE8B@DEWDFEMB12A.global.corp.sap> Hi, I installed a nightly build for the new stage-9 repository. As expected, the repository builds fine. You can see the results on the known web site: http://cr.openjdk.java.net/~simonis/ppc-aix-port/ More build results will show up tomorrow. I removed the column with the build results of the ppc-aix-port/jdk8 repository from the table, as these are now outdated. Best regards, Goetz. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140108/ec9cbada/attachment.html From vladimir.kozlov at oracle.com Wed Jan 8 13:40:32 2014 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Wed, 08 Jan 2014 21:40:32 +0000 Subject: hg: ppc-aix-port/stage-9/hotspot: 3 new changesets Message-ID: <20140108214041.A7C9062472@hg.openjdk.java.net> Changeset: 068a5117af73 Author: iris Date: 2013-12-12 15:27 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/hotspot/rev/068a5117af73 Added tag jdk9-b00 for changeset ce2d7e46f3c7 ! .hgtags Changeset: 050a626a8895 Author: iris Date: 2013-12-13 09:35 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/hotspot/rev/050a626a8895 8030068: Update .jcheck/conf files for JDK 9 Reviewed-by: mr ! .jcheck/conf Changeset: 134e52455808 Author: kvn Date: 2014-01-08 11:24 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/hotspot/rev/134e52455808 Merge From vladimir.kozlov at oracle.com Wed Jan 8 14:13:36 2014 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Wed, 08 Jan 2014 22:13:36 +0000 Subject: hg: ppc-aix-port/stage-9/jaxws: 2 new changesets Message-ID: <20140108221343.A8EA362479@hg.openjdk.java.net> Changeset: 1eeaba183340 Author: iris Date: 2013-12-12 15:27 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/jaxws/rev/1eeaba183340 Added tag jdk9-b00 for changeset 32050ab53c8a ! .hgtags Changeset: 9c9fabbcd3d5 Author: iris Date: 2013-12-13 09:35 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/jaxws/rev/9c9fabbcd3d5 8030068: Update .jcheck/conf files for JDK 9 Reviewed-by: mr ! .jcheck/conf From vladimir.kozlov at oracle.com Wed Jan 8 14:13:48 2014 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Wed, 08 Jan 2014 22:13:48 +0000 Subject: hg: ppc-aix-port/stage-9/langtools: 2 new changesets Message-ID: <20140108221359.C8C2E6247A@hg.openjdk.java.net> Changeset: 8ec7991f0968 Author: iris Date: 2013-12-12 15:27 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/langtools/rev/8ec7991f0968 Added tag jdk9-b00 for changeset afe63d41c699 ! .hgtags Changeset: 3edb37befdd0 Author: iris Date: 2013-12-13 09:36 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/langtools/rev/3edb37befdd0 8030068: Update .jcheck/conf files for JDK 9 Reviewed-by: mr ! .jcheck/conf From vladimir.kozlov at oracle.com Wed Jan 8 14:14:06 2014 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Wed, 08 Jan 2014 22:14:06 +0000 Subject: hg: ppc-aix-port/stage-9/nashorn: 2 new changesets Message-ID: <20140108221409.C7FE26247B@hg.openjdk.java.net> Changeset: 75f66e787d11 Author: iris Date: 2013-12-12 15:27 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/nashorn/rev/75f66e787d11 Added tag jdk9-b00 for changeset 32631eed0fad ! .hgtags Changeset: 65347535840f Author: iris Date: 2013-12-13 09:36 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/nashorn/rev/65347535840f 8030068: Update .jcheck/conf files for JDK 9 Reviewed-by: mr ! .jcheck/conf From vladimir.kozlov at oracle.com Wed Jan 8 14:13:25 2014 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Wed, 08 Jan 2014 22:13:25 +0000 Subject: hg: ppc-aix-port/stage-9/jaxp: 2 new changesets Message-ID: <20140108221332.5F93162478@hg.openjdk.java.net> Changeset: cec9e799464d Author: iris Date: 2013-12-12 15:27 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/jaxp/rev/cec9e799464d Added tag jdk9-b00 for changeset 4045edd35e8b ! .hgtags Changeset: 9ea0852c341f Author: iris Date: 2013-12-13 09:35 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/jaxp/rev/9ea0852c341f 8030068: Update .jcheck/conf files for JDK 9 Reviewed-by: mr ! .jcheck/conf From vladimir.kozlov at oracle.com Wed Jan 8 14:13:20 2014 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Wed, 08 Jan 2014 22:13:20 +0000 Subject: hg: ppc-aix-port/stage-9/corba: 2 new changesets Message-ID: <20140108221322.9FA6C62476@hg.openjdk.java.net> Changeset: 5eb0c07843fc Author: iris Date: 2013-12-12 15:27 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/corba/rev/5eb0c07843fc Added tag jdk9-b00 for changeset a7d3638deb2f ! .hgtags Changeset: 27ab4532fc80 Author: iris Date: 2013-12-13 09:35 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/corba/rev/27ab4532fc80 8030068: Update .jcheck/conf files for JDK 9 Reviewed-by: mr ! .jcheck/conf From vladimir.kozlov at oracle.com Wed Jan 8 14:13:17 2014 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Wed, 08 Jan 2014 22:13:17 +0000 Subject: hg: ppc-aix-port/stage-9: 3 new changesets Message-ID: <20140108221318.50A0F62474@hg.openjdk.java.net> Changeset: c2b11b3fa1df Author: iris Date: 2013-12-12 15:34 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/rev/c2b11b3fa1df Added tag jdk9-b00 for changeset 1e1f86d5d4e2 ! .hgtags Changeset: 4ca37b9541a7 Author: iris Date: 2013-12-13 09:34 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/rev/4ca37b9541a7 8030068: Update .jcheck/conf files for JDK 9 Reviewed-by: mr ! .jcheck/conf Changeset: 06dd913373b4 Author: kvn Date: 2014-01-08 11:20 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/rev/06dd913373b4 Merge From vladimir.kozlov at oracle.com Wed Jan 8 15:17:49 2014 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Wed, 08 Jan 2014 23:17:49 +0000 Subject: hg: ppc-aix-port/stage-9/jdk: 3 new changesets Message-ID: <20140108231853.CFD106247C@hg.openjdk.java.net> Changeset: 8d3141b5734b Author: iris Date: 2013-12-12 15:27 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/jdk/rev/8d3141b5734b Added tag jdk9-b00 for changeset 27b384262cba ! .hgtags Changeset: e554994dd16a Author: iris Date: 2013-12-13 09:36 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/jdk/rev/e554994dd16a 8030068: Update .jcheck/conf files for JDK 9 Reviewed-by: mr ! .jcheck/conf Changeset: c1e75d17b993 Author: kvn Date: 2014-01-08 11:19 -0800 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/jdk/rev/c1e75d17b993 Merge From vladimir.kozlov at oracle.com Wed Jan 8 15:31:47 2014 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Wed, 08 Jan 2014 23:31:47 +0000 Subject: hg: ppc-aix-port/stage-9/hotspot: 3 new changesets Message-ID: <20140108233157.830AE6247D@hg.openjdk.java.net> Changeset: ad6695638a35 Author: goetz Date: 2013-12-20 13:51 +0100 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/hotspot/rev/ad6695638a35 8030863: PPC64: (part 220): ConstantTableBase for calls between args and jvms Summary: Add ConstantTableBase node edge after parameters and before jvms. Adapt jvms offsets. Reviewed-by: kvn ! src/cpu/ppc/vm/ppc.ad ! src/share/vm/adlc/archDesc.cpp ! src/share/vm/adlc/archDesc.hpp ! src/share/vm/adlc/main.cpp ! src/share/vm/adlc/output_c.cpp ! src/share/vm/adlc/output_h.cpp ! src/share/vm/opto/callnode.cpp ! src/share/vm/opto/callnode.hpp ! src/share/vm/opto/compile.hpp ! src/share/vm/opto/matcher.cpp Changeset: c3efa8868779 Author: goetz Date: 2014-01-06 11:02 +0100 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/hotspot/rev/c3efa8868779 8031188: Fix for 8029015: PPC64 (part 216): opto: trap based null and range checks Summary: Swap the Projs in the block list so that the new block is added behind the proper node. Reviewed-by: kvn ! src/share/vm/opto/block.cpp Changeset: b858620b0081 Author: goetz Date: 2014-01-07 17:24 +0100 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/hotspot/rev/b858620b0081 8031319: PPC64: Some fixes in ppc and aix coding. Reviewed-by: kvn ! src/cpu/ppc/vm/cppInterpreter_ppc.cpp ! src/cpu/ppc/vm/macroAssembler_ppc.cpp ! src/cpu/ppc/vm/nativeInst_ppc.cpp ! src/cpu/ppc/vm/nativeInst_ppc.hpp ! src/cpu/ppc/vm/ppc.ad ! src/cpu/ppc/vm/stubGenerator_ppc.cpp ! src/os/aix/vm/os_aix.cpp ! src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp From goetz.lindenmaier at sap.com Mon Jan 13 02:40:44 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 13 Jan 2014 10:40:44 +0000 Subject: Red lights on build overview page. Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8BC59@DEWDFEMB12A.global.corp.sap> Hi, Somebody may have noticed that the builds on our build result page are not passing: http://cr.openjdk.java.net/~simonis/ppc-aix-port/index.html This is not a problem of the port - we have an issue with the disc space for the builds. We are working on this. Best regards, Goetz. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140113/f01a1832/attachment.html From volker.simonis at gmail.com Tue Jan 14 00:40:55 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 14 Jan 2014 09:40:55 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests Message-ID: Hi, could you please review the following changes for the ppc-aix-port stage/stage-9 repositories (the changes are planned for integration into ppc-aix-port/stage-9 and subsequent backporting to ppc-aix-port/stage): http://cr.openjdk.java.net/~simonis/webrevs/8031581/ I've build and smoke tested without any problems on Linux/x86_64 and PPC64, Windows/x86_64, MacOSX, Solaris/SPARC64 and AIX7PPC64. With these changes (and together with the changes from "8028537: PPC64: Updated jdk/test scripts to understand the AIX os and environment" and "8031134 : PPC64: implement printing on AIX") our port passes all but the following 7 jtreg regression tests on AIX (compared to the Linux/x86_64 baseline from www.java.net/download/jdk8/testresults/testresults.html?): java/net/Inet6Address/B6558853.java java/nio/channels/AsynchronousChannelGroup/Basic.java (sporadically) java/nio/channels/AsynchronousChannelGroup/GroupOfOne.java java/nio/channels/AsynchronousChannelGroup/Unbounded.java (sporadically) java/nio/channels/Selector/RacyDeregister.java sun/security/krb5/auto/Unreachable.java (only on IPv6) Thank you and best regards, Volker Following a detailed description of the various changes: src/share/native/java/util/zip/zip_util.c src/share/native/sun/management/DiagnosticCommandImpl.c - According to ISO C it is perfectly legal for malloc to return zero if called with a zero argument. Fix various places where malloc can potentially correctly return zero because it was called with a zero argument. - Also fixed DiagnosticCommandImpl.c to include stdlib.h. This only fixes a compiler warning on Linux, but on AIX it prevents a VM crash later on because the return value of malloc() will be casted to int which is especially bad if that pointer was bigger than 32-bit. make/CompileJavaClasses.gmk - Also use PollingWatchService on AIX. make/lib/NioLibraries.gmk src/aix/native/sun/nio/ch/AixNativeThread.c - Put the implementation for the native methods of NativeThread into AixNativeThread.c on AIX. src/solaris/native/sun/nio/ch/PollArrayWrapper.c src/solaris/native/sun/nio/ch/Net.c src/aix/classes/sun/nio/ch/AixPollPort.java src/aix/native/sun/nio/ch/AixPollPort.c src/aix/native/java/net/aix_close.c - On AIX, the constants used for the polling events (i.e. POLLIN, POLLOUT, ...) are defined to different values than on other operating systems. The problem is however, that these constants are hardcoded as public final static members of various, shared Java classes. We therefore have to map them from Java to native every time before calling one of the native poll functions and back to Java after the call on AIX in order to get the right semantics. src/share/classes/java/nio/file/CopyMoveHelper.java - As discussed on the core-libs mailing list (see http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-December/024119.html) it is not necessary to call Files.getFileAttributeView() with any linkOptions because at that place we've already checked that the target file can not be a symbolic link. This change makes the implementation more robust on platforms which support symbolic links but do not support the O_NOFOLLOW flag to the open system call. It also makes the JDK pass the demo/zipfs/basic.sh test on AIX. src/share/classes/sun/nio/cs/ext/ExtendedCharsets.java - Support "compound text" on AIX in the same way like on other Unix platforms. src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider - Define the correct attach provider for AIX. src/solaris/native/java/net/net_util_md.h src/solaris/native/sun/nio/ch/FileDispatcherImpl.c src/solaris/native/sun/nio/ch/ServerSocketChannelImpl.c - AIX needs a workaround for I/O cancellation (see: http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.basetechref/doc/basetrf1/close.htm). "..The close() subroutine is blocked until all subroutines which use the file descriptor return to usr space. For example, when a thread is calling close and another thread is calling select with the same file descriptor, the close subroutine does not return until the select call returns...". To fix this problem, we have to use the various NET_ wrappers which are declared in net_util_md.h and defined in aix_close.c and we also need some additional wrappers for fcntl(), read() and write() on AIX. While the current solution isn't really nice because it introduces some more AIX-specifc sections in shared code, I think it is the best way to go for JDK 8 because it imposes the smallest possible changes and risks for the existing platforms. I'm ready to change the code to unconditionally use the wrappers for all platforms and implement the wrappers empty on platforms which don't need any wrapping. I think it would also be nice to clean up the names (e.g. NET_Read() is currently a wrapper for recv()and the NET_ prefix is probably not appropriate any more so maybe change it to something like IO_). But again, I'll prefer to keep that as a follow up change for JDK9. - Calling fsync() on a "read-only" file descriptor on AIX will result in an error (i.e. "EBADF: The FileDescriptor parameter is not a valid file descriptor open for writing."). To prevent this error we have to query if the corresponding file descriptor is writeable. Notice that at that point we can not access the writable attribute of the corresponding file channel so we have to use fcntl(). src/solaris/classes/java/lang/UNIXProcess.java.aix - On AIX the implementation is especially tricky, because the close()system call will block if another thread is at the same time blocked in a file operation (e.g. 'read()') on the same file descriptor. We therefore combine the AIX ProcessPipeInputStream implemenatation with the DeferredCloseInputStream approach used on Solaris (see UNIXProcess.java.solaris). This means that every potentially blocking operation on the file descriptor increments a counter before it is executed and decrements it once it finishes. The 'close()' operation will only be executed if there are no pending operations. Otherwise it is deferred after the last pending operation has finished. src/share/transport/socket/socketTransport.c - On AIX we have to call shutdown() on a socket descriptor before closing it, otherwise the close() call may be blocked. This is the same problem as described before. Unfortunately the JDI framework doesn't use the same IO wrappers like other class library components so we can not easily use the NET_ abstractions from aix_close.c here. - Without this small change all JDI regression tests will fail on AIX because of the way how the tests act as a "debugger" which launches another VM (the "debugge") which connects itself back to the debugger. In this scenario the "debugge" can not shut down itself because one thread will always be blocked in the close() call on one of the communication sockets. src/solaris/native/java/net/NetworkInterface.c - Set the scope identifier for IPv6 addresses on AIX. src/solaris/native/java/net/net_util_md.c - It turns out that we do not always have to replace SO_REUSEADDR on AIX by SO_REUSEPORT. Instead we can simply use the same approach like BSD and only use SO_REUSEPORT additionally, if several datagram sockets try to bind to the same port. - Also fixed a comment and removed unused local variables. - Fixed the obviously inverted assignment newTime = prevTime; which should read prevTime = newTime;. Otherwise prevTime will never change and the timeout will be potential reached too fast. src/solaris/native/sun/management/OperatingSystemImpl.c - AIX does not understand /proc/self so we have to query the real process ID to access the proc file system. src/solaris/native/sun/nio/ch/DatagramChannelImpl.c - On AIX, connect() may legally return EAFNOSUPPORT if called on a socket with the address family set to AF_UNSPEC. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140114/866bb06d/attachment.html From Alan.Bateman at oracle.com Tue Jan 14 01:19:20 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Tue, 14 Jan 2014 09:19:20 +0000 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: Message-ID: <52D50118.3080000@oracle.com> On 14/01/2014 08:40, Volker Simonis wrote: > Hi, > > could you please review the following changes for the ppc-aix-port > stage/stage-9 repositories (the changes are planned for integration > into ppc-aix-port/stage-9 and subsequent backporting to > ppc-aix-port/stage): I'd like to review this but I won't have time until later in the week. From an initial look then there are a few things are not pretty (the changes to fix the AIX problems with I/O cancellation in particular) and I suspect that some refactoring is going to be required to handle some of this cleanly. A minor comment is that bug synopsis doesn't really communicate what these changes are about. -Alan. From volker.simonis at gmail.com Tue Jan 14 01:56:55 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 14 Jan 2014 10:56:55 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: <52D50118.3080000@oracle.com> References: <52D50118.3080000@oracle.com> Message-ID: Hi Alan, On Tue, Jan 14, 2014 at 10:19 AM, Alan Bateman wrote: > On 14/01/2014 08:40, Volker Simonis wrote: > >> Hi, >> >> could you please review the following changes for the ppc-aix-port >> stage/stage-9 repositories (the changes are planned for integration into >> ppc-aix-port/stage-9 and subsequent backporting to ppc-aix-port/stage): >> > > I'd like to review this but I won't have time until later in the week. Thanks, that would be great. > From an initial look then there are a few things are not pretty (the > changes to fix the AIX problems with I/O cancellation in particular) and I > suspect that some refactoring is going to be required to handle some of > this cleanly. I agree, but as I wrote, this change is intended to finally go into both jdk9 and jkd8u and I didn't wanted to do to much refactoring for jdk8u. Once its in and we have a working port I'd be happy to work on refactoring the code but I think this should be done in jdk9 where we have more time and less risks of changing the behaviour on the existing platforms. > A minor comment is that bug synopsis doesn't really communicate what these > changes are about. > > This is the last "bulk change" which address issues in several different areas of the class library. Follow up changes will be more specific with better bug synopsis (I promise :). Regards, Volker > -Alan. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140114/0f098ab3/attachment-0001.html From david.holmes at oracle.com Tue Jan 14 03:29:48 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 14 Jan 2014 21:29:48 +1000 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: Message-ID: <52D51FAC.8060800@oracle.com> Just a note on this part (I havent looked at the code): > On AIX, the constants used for the polling events (i.e. POLLIN, POLLOUT, ...) are defined to different values than on other operating systems. The problem is however, that these constants are hardcoded as public final static members of various, shared Java classes. Sounds like this should be handled the same way that the other "system constants" are handled - you can either store a platform file in the repo (for cross-compiling) or you generate the class containing the constants at build time. David On 14/01/2014 6:40 PM, Volker Simonis wrote: > Hi, > > could you please review the following changes for the ppc-aix-port > stage/stage-9 repositories (the changes are planned for integration into > ppc-aix-port/stage-9 and subsequent backporting to ppc-aix-port/stage): > > http://cr.openjdk.java.net/~simonis/webrevs/8031581/ > > I've build and smoke tested without any problems on Linux/x86_64 and > PPC64, Windows/x86_64, MacOSX, Solaris/SPARC64 and AIX7PPC64. > > With these changes (and together with the changes from "8028537: PPC64: > Updated jdk/test scripts to understand the AIX os and environment" and > "8031134 : PPC64: implement printing on AIX") our port passes all but > the following 7 jtreg regression tests on AIX (compared to the > Linux/x86_64 baseline from > www.java.net/download/jdk8/testresults/testresults.html > ?): > > java/net/Inet6Address/B6558853.java > java/nio/channels/AsynchronousChannelGroup/Basic.java (sporadically) > java/nio/channels/AsynchronousChannelGroup/GroupOfOne.java > java/nio/channels/AsynchronousChannelGroup/Unbounded.java (sporadically) > java/nio/channels/Selector/RacyDeregister.java > sun/security/krb5/auto/Unreachable.java (only on IPv6) > > Thank you and best regards, > Volker > > > Following a detailed description of the various changes: > > > src/share/native/java/util/zip/zip_util.c > src/share/native/sun/management/DiagnosticCommandImpl.c > > * According to ISO C it is perfectly legal for malloc to return zero > if called with a zero argument. Fix various places where malloc can > potentially correctly return zero because it was called with a zero > argument. > * Also fixed |DiagnosticCommandImpl.c| to include |stdlib.h|. This > only fixes a compiler warning on Linux, but on AIX it prevents a VM > crash later on because the return value of |malloc()| will be casted > to |int| which is especially bad if that pointer was bigger than > 32-bit. > > > make/CompileJavaClasses.gmk > > * Also use |PollingWatchService| on AIX. > > > make/lib/NioLibraries.gmk > src/aix/native/sun/nio/ch/AixNativeThread.c > > * Put the implementation for the native methods of |NativeThread| into > |AixNativeThread.c| on AIX. > > > src/solaris/native/sun/nio/ch/PollArrayWrapper.c > src/solaris/native/sun/nio/ch/Net.c > src/aix/classes/sun/nio/ch/AixPollPort.java > src/aix/native/sun/nio/ch/AixPollPort.c > src/aix/native/java/net/aix_close.c > > * On AIX, the constants used for the polling events (i.e. |POLLIN|, > |POLLOUT|, ...) are defined to different values than on other > operating systems. The problem is however, that these constants are > hardcoded as public final static members of various, shared Java > classes. We therefore have to map them from Java to native every > time before calling one of the native poll functions and back to > Java after the call on AIX in order to get the right semantics. > > > src/share/classes/java/nio/file/CopyMoveHelper.java > > * As discussed on the core-libs mailing list (see > http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-December/024119.html) > it is not necessary to call |Files.getFileAttributeView()| with any > |linkOptions| because at that place we've already checked that the > target file can not be a symbolic link. This change makes the > implementation more robust on platforms which support symbolic links > but do not support the |O_NOFOLLOW| flag to the |open| system call. > It also makes the JDK pass the |demo/zipfs/basic.sh| test on AIX. > > > src/share/classes/sun/nio/cs/ext/ExtendedCharsets.java > > * Support "compound text" on AIX in the same way like on other Unix > platforms. > > > src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider > > * Define the correct attach provider for AIX. > > > src/solaris/native/java/net/net_util_md.h > src/solaris/native/sun/nio/ch/FileDispatcherImpl.c > src/solaris/native/sun/nio/ch/ServerSocketChannelImpl.c > > * AIX needs a workaround for I/O cancellation (see: > http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.basetechref/doc/basetrf1/close.htm). > "..The |close()| subroutine is blocked until all subroutines which > use the file descriptor return to usr space. For example, when a > thread is calling close and another thread is calling select with > the same file descriptor, the close subroutine does not return until > the select call returns...". To fix this problem, we have to use the > various |NET_| wrappers which are declared in |net_util_md.h| and > defined in |aix_close.c| and we also need some additional wrappers > for |fcntl()|, |read()| and |write()| on AIX. > While the current solution isn't really nice because it introduces > some more AIX-specifc sections in shared code, I think it is the > best way to go for JDK 8 because it imposes the smallest possible > changes and risks for the existing platforms. I'm ready to change > the code to unconditionally use the wrappers for all platforms and > implement the wrappers empty on platforms which don't need any > wrapping. I think it would also be nice to clean up the names (e.g. > |NET_Read()| is currently a wrapper for |recv()| and the |NET_| > prefix is probably not appropriate any more so maybe change it to > something like |IO_|). But again, I'll prefer to keep that as a > follow up change for JDK9. > * Calling |fsync()| on a "read-only" file descriptor on AIX will > result in an error (i.e. "EBADF: The FileDescriptor parameter is not > a valid file descriptor open for writing."). To prevent this error > we have to query if the corresponding file descriptor is writeable. > Notice that at that point we can not access the |writable| attribute > of the corresponding file channel so we have to use |fcntl()|. > > > src/solaris/classes/java/lang/UNIXProcess.java.aix > > * On AIX the implementation is especially tricky, because the > |close()| system call will block if another thread is at the same > time blocked in a file operation (e.g. 'read()') on the same file > descriptor. We therefore combine the AIX |ProcessPipeInputStream| > implemenatation with the |DeferredCloseInputStream| approach used on > Solaris (see |UNIXProcess.java.solaris|). This means that every > potentially blocking operation on the file descriptor increments a > counter before it is executed and decrements it once it finishes. > The 'close()' operation will only be executed if there are no > pending operations. Otherwise it is deferred after the last pending > operation has finished. > > > src/share/transport/socket/socketTransport.c > > * On AIX we have to call |shutdown()| on a socket descriptor before > closing it, otherwise the |close()| call may be blocked. This is the > same problem as described before. Unfortunately the JDI framework > doesn't use the same IO wrappers like other class library components > so we can not easily use the |NET_| abstractions from |aix_close.c| > here. > * Without this small change all JDI regression tests will fail on AIX > because of the way how the tests act as a "debugger" which launches > another VM (the "debugge") which connects itself back to the > debugger. In this scenario the "debugge" can not shut down itself > because one thread will always be blocked in the |close()| call on > one of the communication sockets. > > > src/solaris/native/java/net/NetworkInterface.c > > * Set the scope identifier for IPv6 addresses on AIX. > > > src/solaris/native/java/net/net_util_md.c > > * It turns out that we do not always have to replace |SO_REUSEADDR| on > AIX by |SO_REUSEPORT|. Instead we can simply use the same approach > like BSD and only use |SO_REUSEPORT| additionally, if several > datagram sockets try to bind to the same port. > * Also fixed a comment and removed unused local variables. > * Fixed the obviously inverted assignment |newTime = prevTime;| which > should read |prevTime = newTime;|. Otherwise |prevTime| will never > change and the timeout will be potential reached too fast. > > > src/solaris/native/sun/management/OperatingSystemImpl.c > > * AIX does not understand |/proc/self| so we have to query the real > process ID to access the proc file system. > > > src/solaris/native/sun/nio/ch/DatagramChannelImpl.c > > * On AIX, |connect()| may legally return |EAFNOSUPPORT| if called on a > socket with the address family set to |AF_UNSPEC|. > > From goetz.lindenmaier at sap.com Tue Jan 14 05:52:04 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 14 Jan 2014 13:52:04 +0000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293F087.2080700@oracle.com> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> Hi, I updated this webrev. I detected a small flaw I made when editing this version. The #endif in line 322, parse3.cpp was in the wrong line. I also based the webrev on the latest version of the stage repo. http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ Best regards, Goetz. -----Original Message----- From: Lindenmaier, Goetz Sent: Freitag, 20. Dezember 2013 13:47 To: David Holmes Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes Hi David, > So we can at least undo #4 now we have established those tests were not > required to pass. We would prefer if we could keep this in. We want to avoid that it's blamed on the VM if java programs are failing on PPC after they worked on x86. To clearly mark it as overfulfilling the spec I would guard it by a flag as proposed. But if you insist I will remove it. Also, this part is not that performance relevant. > A compile-time guard (ifdef) would be better than a runtime one I think I added a compile-time guard in this new webrev: http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces several double negations I don't like, (#ifNdef CPU_NOT_MULTIPLE_COPY_ATOMIC) but this way I only have to change the ppc platform. Best regards, Goetz P.S.: I will also be available over the Christmas period. -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Freitag, 20. Dezember 2013 05:58 To: Lindenmaier, Goetz Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes Sorry for the delay, it takes a while to catch up after two weeks vacation :) Next vacation (ie next two weeks) I'll continue to check emails. On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: > Hi, > > ok, I understand the tests are wrong. It's good this issue is settled. > Thanks Aleksey and Andreas for going into the details of the proof! > > About our change: David, the causality is the other way round. > The change is about IRIW. > 1. To pass IRIW, we must use sync instructions before loads. This is the part I still have some question marks over as the implications are not nice for performance on non-TSO platforms. But I'm no further along in processing that paper I'm afraid. > 2. If we do syncs before loads, we don't need to do them after stores. > 3. If we don't do them after stores, we fail the volatile constructor tests. > 4. So finally we added them again at the end of the constructor after stores > to pass the volatile constructor tests. So we can at least undo #4 now we have established those tests were not required to pass. > We originally passed the constructor tests because the ppc memory order > instructions are not as find-granular as the > operations in the IR. MemBarVolatile is specified as StoreLoad. The only instruction > on PPC that does StoreLoad is sync. But sync also does StoreStore, therefore the > MemBarVolatile after the store fixes the constructor tests. The proper representation > of the fix in the IR would be adding a MemBarStoreStore. But now it's pointless > anyways. > >> I'm not happy with the ifdef approach but I won't block it. > I'd be happy to add a property > OrderAccess::cpu_is_multiple_copy_atomic() A compile-time guard (ifdef) would be better than a runtime one I think - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic based not architecture based) as that will allows for turning this on/off for any architecture for testing purposes. Thanks, David > or the like to guard the customization. I'd like that much better. Or also > OrderAccess::needs_support_iriw_ordering() > VM_Version::needs_support_iriw_ordering() > > > Best regards, > Goetz. > > > > > > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Donnerstag, 28. November 2013 00:34 > To: Lindenmaier, Goetz > Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > TL;DR version: > > Discussion on the c-i list has now confirmed that a constructor-barrier > for volatiles is not required as part of the JMM specification. It *may* > be required in an implementation that doesn't pre-zero memory to ensure > you can't see uninitialized fields. So the tests for this are invalid > and this part of the patch is not needed in general (ppc64 may need it > due to other factors). > > Re: "multiple copy atomicity" - first thanks for correcting the term :) > Second thanks for the reference to that paper! For reference: > > "The memory system (perhaps involving a hierarchy of buffers and a > complex interconnect) does not guarantee that a write becomes visible to > all other hardware threads at the same time point; these architectures > are not multiple-copy atomic." > > This is the visibility issue that I referred to and affects both ARM and > PPC. But of course it is normally handled by using suitable barriers > after the stores that need to be visible. I think the crux of the > current issue is what you wrote below: > > > The fixes for the constructor issue are only needed because we > > remove the sync instruction from behind stores (parse3.cpp:320) > > and place it before loads. > > I hadn't grasped this part. Obviously if you fail to do the sync after > the store then you have to do something around the loads to get the same > results! I still don't know what lead you to the conclusion that the > only way to fix the IRIW issue was to put the fence before the load - > maybe when I get the chance to read that paper in full it will be clearer. > > So ... the basic problem is that the current structure in the VM has > hard-wired one choice of how to get the right semantics for volatile > variables. You now want to customize that but not all the requisite > hooks are present. It would be better if volatile_load and > volatile_store were factored out so that they could be implemented as > desired per-platform. Alternatively there could be pre- and post- hooks > that could then be customized per platform. Otherwise you need > platform-specific ifdef's to handle it as per your patch. > > I'm not happy with the ifdef approach but I won't block it. I think this > is an area where a lot of clean up is needed in the VM. The barrier > abstractions are a confused mess in my opinion. > > Thanks, > David > ----- > > On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >> Hi, >> >> I updated the webrev to fix the issues mentioned by Vladimir: >> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >> >> I did not yet add the >> OrderAccess::needs_support_iriw_ordering() >> VM_Version::needs_support_iriw_ordering() >> or >> OrderAccess::cpu_is_multiple_copy_atomic() >> to reduce #defined, as I got no further comment on that. >> >> >> WRT to the validity of the tests and the interpretation of the JMM >> I feel not in the position to contribute substantially. >> >> But we would like to pass the torture test suite as we consider >> this a substantial task in implementing a PPC port. Also we think >> both tests show behavior a programmer would expect. It's bad if >> Java code runs fine on the more common x86 platform, and then >> fails on ppc. This will always first be blamed on the VM. >> >> The fixes for the constructor issue are only needed because we >> remove the sync instruction from behind stores (parse3.cpp:320) >> and place it before loads. Then there is no sync between volatile store >> and publishing the object. So we add it again in this one case >> (volatile store in constructor). >> >> >> @David >>>> Sure. There also is no solution as you require for the taskqueue problem yet, >>>> and that's being discussed now for almost a year. >>> It may have started a year ago but work on it has hardly been continuous. >> That's not true, we did a lot of investigation and testing on this issue. >> And we came up with a solution we consider the best possible. If you >> have objections, you should at least give the draft of a better solution, >> we would volunteer to implement and test it. >> Similarly, we invested time in fixing the concurrency torture issues. >> >> @David >>> What is "multiple-read-atomicity"? I'm not familiar with the term and >>> can't find any reference to it. >> We learned about this reading "A Tutorial Introduction to the ARM and >> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >> Peter Sewell, which is cited in "Correct and Efficient Work-Stealing for >> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >> and Francesco Zappa Nardelli (PPoPP `13) when analysing the taskqueue problem. >> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >> >> I was wrong in one thing, it's called multiple copy atomicity, I used 'read' >> instead. Sorry for that. (I also fixed that in the method name above). >> >> Best regards and thanks for all your involvements, >> Goetz. >> >> >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Mittwoch, 27. November 2013 12:53 >> To: Lindenmaier, Goetz >> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> Hi Goetz, >> >> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>> Hi David, >>> >>> -- Volatile in constuctor >>>> AFAIK we have not seen those tests fail due to a >>>> missing constructor barrier. >>> We see them on PPC64. Our test machines have typically 8-32 processors >>> and are Power 5-7. But see also Aleksey's mail. (Thanks Aleksey!) >> >> And see follow ups - the tests are invalid. >> >>> -- IRIW issue >>>> I can not possibly answer to the necessary level of detail with a few >>>> moments thought. >>> Sure. There also is no solution as you require for the taskqueue problem yet, >>> and that's being discussed now for almost a year. >> >> It may have started a year ago but work on it has hardly been continuous. >> >>>> You are implying there is a problem here that will >>>> impact numerous platforms (unless you can tell me why ppc is so different?) >>> No, only PPC does not have 'multiple-read-atomicity'. Therefore I contributed a >>> solution with the #defines, and that's correct for all, but not nice, I admit. >>> (I don't really know about ARM, though). >>> So if I can write down a nicer solution testing for methods that are evaluated >>> by the C-compiler I'm happy. >>> >>> The problem is not that IRIW is not handled by the JMM, the problem >>> is that >>> store >>> sync >>> does not assure multiple-read-atomicity, >>> only >>> sync >>> load >>> does so on PPC. And you require multiple-read-atomicity to >>> pass that test. >> >> What is "multiple-read-atomicity"? I'm not familiar with the term and >> can't find any reference to it. >> >> Thanks, >> David >> >> The JMM is fine. And >>> store >>> MemBarVolatile >>> is fine on x86, sparc etc. as there exist assembler instructions that >>> do what is required. >>> >>> So if you are off soon, please let's come to a solution that >>> might be improvable in the way it's implemented, but that >>> allows us to implement a correct PPC64 port. >>> >>> Best regards, >>> Goetz. >>> >>> >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Tuesday, November 26, 2013 1:11 PM >>> To: Lindenmaier, Goetz >>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>> >>> Hi Goetz, >>> >>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>> Hi everybody, >>>> >>>> thanks a lot for the detailed reviews! >>>> I'll try to answer to all in one mail. >>>> >>>>> Volatile fields written in constructor aren't guaranteed by JMM to occur before the reference is assigned; >>>> We don't think it's correct if we omit the barrier after initializing >>>> a volatile field. Previously, we discussed this with Aleksey Shipilev >>>> and Doug Lea, and they agreed. >>>> Also, concurrency torture tests >>>> LongVolatileTest >>>> AtomicIntegerInitialValueTest >>>> will fail. >>>> (In addition, observing 0 instead of the inital value of a volatile field would be >>>> very counter-intuitive for Java programmers, especially in AtomicInteger.) >>> >>> The affects of unsafe publication are always surprising - volatiles do >>> not add anything special here. AFAIK there is nothing in the JMM that >>> requires the constructor barrier - discussions with Doug and Aleksey >>> notwithstanding. AFAIK we have not seen those tests fail due to a >>> missing constructor barrier. >>> >>>>> proposed for PPC64 is to make volatile reads extremely heavyweight >>>> Yes, it costs measurable performance. But else it is wrong. We don't >>>> see a way to implement this cheaper. >>>> >>>>> - these algorithms should be expressed using the correct OrderAccess operations >>>> Basically, I agree on this. But you also have to take into account >>>> that due to the different memory ordering instructions on different platforms >>>> just implementing something empty is not sufficient. >>>> An example: >>>> MemBarRelease // means LoadStore, StoreStore barrier >>>> MemBarVolatile // means StoreLoad barrier >>>> If these are consecutively in the code, sparc code looks like this: >>>> MemBarRelease --> membar(Assembler::LoadStore | Assembler::StoreStore) >>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>> Just doing what is required. >>>> On Power, we get suboptimal code, as there are no comparable, >>>> fine grained operations: >>>> MemBarRelease --> lwsync // Doing LoadStore, StoreStore, LoadLoad >>>> MemBarVolatile --> sync // // Doing LoadStore, StoreStore, LoadLoad, StoreLoad >>>> obviously, the lwsync is superfluous. Thus, as PPC operations are more (too) powerful, >>>> I need an additional optimization that removes the lwsync. I can not implement >>>> MemBarRelease empty, as it is also used independently. >>>> >>>> Back to the IRIW problem. I think here we have a comparable issue. >>>> Doing the MemBarVolatile or the OrderAccess::fence() before the read >>>> is inefficient on platforms that have multiple-read-atomicity. >>>> >>>> I would propose to guard the code by >>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>> OrderAccess::cpu_is_multiple_read_atomic() >>>> Else, David, how would you propose to implement this platform independent? >>>> (Maybe we can also use above method in taskqueue.hpp.) >>> >>> I can not possibly answer to the necessary level of detail with a few >>> moments thought. You are implying there is a problem here that will >>> impact numerous platforms (unless you can tell me why ppc is so >>> different?) and I can not take that on face value at the moment. The >>> only reason I can see IRIW not being handled by the JMM requirements for >>> volatile accesses is if there are global visibility issues that are not >>> addressed - but even then I would expect heavy barriers at the store >>> would deal with that, not at the load. (This situation reminds me of the >>> need for read-barriers on Alpha architecture due to the use of software >>> cache-coherency rather than hardware cache-coherency - but we don't have >>> that on ppc!) >>> >>> Sorry - There is no quick resolution here and in a couple of days I will >>> be heading out on vacation for two weeks. >>> >>> David >>> ----- >>> >>>> Best regards, >>>> Goetz. >>>> >>>> -- Other ports: >>>> The IRIW issue requires at least 3 processors to be relevant, so it might >>>> not happen on small machines. But I can use PPC_ONLY instead >>>> of PPC64_ONLY if you request so (and if we don't get rid of them). >>>> >>>> -- MemBarStoreStore after initialization >>>> I agree we should not change it in the ppc port. If you wish, I can >>>> prepare an extra webrev for hotspot-comp. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>> To: Vladimir Kozlov >>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>> >>>> Okay this is my second attempt at answering this in a reasonable way :) >>>> >>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>> I have to ask David to do correctness evaluation. >>>> >>>> From what I understand what we see here is an attempt to fix an >>>> existing issue with the implementation of volatiles so that the IRIW >>>> problem is addressed. The solution proposed for PPC64 is to make >>>> volatile reads extremely heavyweight by adding a fence() when doing the >>>> load. >>>> >>>> Now if this was purely handled in ppc64 source code then I would be >>>> happy to let them do whatever they like (surely this kills performance >>>> though!). But I do not agree with the changes to the shared code that >>>> allow this solution to be implemented - even with PPC64_ONLY this is >>>> polluting the shared code. My concern is similar to what I said with the >>>> taskQueue changes - these algorithms should be expressed using the >>>> correct OrderAccess operations to guarantee the desired properties >>>> independent of architecture. If such a "barrier" is not needed on a >>>> given architecture then the implementation in OrderAccess should reduce >>>> to a no-op. >>>> >>>> And as Vitaly points out the constructor barriers are not needed under >>>> the JMM. >>>> >>>>> I am fine with suggested changes because you did not change our current >>>>> code for our platforms (please, do not change do_exits() now). >>>>> But may be it should be done using more general query which is set >>>>> depending on platform: >>>>> >>>>> OrderAccess::needs_support_iriw_ordering() >>>>> >>>>> or similar to what we use now: >>>>> >>>>> VM_Version::needs_support_iriw_ordering() >>>> >>>> Every platform has to support IRIW this is simply part of the Java >>>> Memory Model, there should not be any need to call this out explicitly >>>> like this. >>>> >>>> Is there some subtlety of the hardware I am missing here? Are there >>>> visibility issues beyond the ordering constraints that the JMM defines? >>>>> From what I understand our ppc port is also affected. David? >>>> >>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>> >>>> David >>>> ----- >>>> >>>>> In library_call.cpp can you add {}? New comment should be inside else {}. >>>>> >>>>> I think you should make _wrote_volatile field not ppc64 specific which >>>>> will be set to 'true' only on ppc64. Then you will not need PPC64_ONLY() >>>>> except in do_put_xxx() where it is set to true. Too many #ifdefs. >>>>> >>>>> In do_put_xxx() can you combine your changes: >>>>> >>>>> if (is_vol) { >>>>> // See comment in do_get_xxx(). >>>>> #ifndef PPC64 >>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>> #else >>>>> if (is_field) { >>>>> // Add MemBarRelease for constructors which write volatile field >>>>> (PPC64). >>>>> set_wrote_volatile(true); >>>>> } >>>>> #endif >>>>> } >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>> Hi, >>>>>> >>>>>> I preprared a webrev with fixes for PPC for the VolatileIRIWTest of >>>>>> the torture test suite: >>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>> >>>>>> Example: >>>>>> volatile x=0, y=0 >>>>>> __________ __________ __________ __________ >>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>> >>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>> read(y) read(x) >>>>>> >>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>> >>>>>> >>>>>> Solution: This example requires multiple-copy-atomicity. This is only >>>>>> assured by the sync instruction and if it is executed in the threads >>>>>> doing the loads. Thus we implement volatile read as sync-load-acquire >>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>> MemBarVolatile happens to be implemented by sync. >>>>>> We fix this in C2 and the cpp interpreter. >>>>>> >>>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs >>>>>> fails on staxf" for taskqueue.hpp. >>>>>> >>>>>> Further this change contains a fix that assures that volatile fields >>>>>> written in constructors are visible before the reference gets >>>>>> published. >>>>>> >>>>>> >>>>>> Looking at the code, we found a MemBarRelease that to us, seems too >>>>>> strong. >>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should suffice. >>>>>> What do you think? >>>>>> >>>>>> Please review and test this change. >>>>>> >>>>>> Best regards, >>>>>> Goetz. >>>>>> From volker.simonis at gmail.com Tue Jan 14 06:10:27 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 14 Jan 2014 15:10:27 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: <52D51FAC.8060800@oracle.com> References: <52D51FAC.8060800@oracle.com> Message-ID: On Tue, Jan 14, 2014 at 12:29 PM, David Holmes wrote: > Just a note on this part (I havent looked at the code): > > > On AIX, the constants used for the polling events (i.e. POLLIN, POLLOUT, >> ...) are defined to different values than on other operating systems. The >> problem is however, that these constants are hardcoded as public final >> static members of various, shared Java classes. >> > > Sounds like this should be handled the same way that the other "system > constants" are handled - you can either store a platform file in the repo > (for cross-compiling) or you generate the class containing the constants at > build time. > > Hi David, thanks for your comments. That sound like a good idea but I'm not sure if it would make sense to duplicate the following files: src/share/classes/sun/nio/ch/AbstractPollArrayWrapper.java: src/solaris/classes/sun/nio/ch/Port.java because of this. Do you have a concrete example where Java-classes are being generated with different constants in the class library build? Both solutions would result in different class files on Aix and other Unix variants. What do you think about assigning the concrete values depending on 'os.name' in the static initializers of the corresponding classes? I think that shouldn't introduce too much overhead and I could get rid of all the ugly conversion code. Regards, Volker > David > > > On 14/01/2014 6:40 PM, Volker Simonis wrote: > >> Hi, >> >> could you please review the following changes for the ppc-aix-port >> stage/stage-9 repositories (the changes are planned for integration into >> ppc-aix-port/stage-9 and subsequent backporting to ppc-aix-port/stage): >> >> http://cr.openjdk.java.net/~simonis/webrevs/8031581/ >> >> I've build and smoke tested without any problems on Linux/x86_64 and >> PPC64, Windows/x86_64, MacOSX, Solaris/SPARC64 and AIX7PPC64. >> >> With these changes (and together with the changes from "8028537: PPC64: >> Updated jdk/test scripts to understand the AIX os and environment" and >> "8031134 : PPC64: implement printing on AIX") our port passes all but >> the following 7 jtreg regression tests on AIX (compared to the >> Linux/x86_64 baseline from >> www.java.net/download/jdk8/testresults/testresults.html >> ?): >> >> >> java/net/Inet6Address/B6558853.java >> java/nio/channels/AsynchronousChannelGroup/Basic.java (sporadically) >> java/nio/channels/AsynchronousChannelGroup/GroupOfOne.java >> java/nio/channels/AsynchronousChannelGroup/Unbounded.java (sporadically) >> java/nio/channels/Selector/RacyDeregister.java >> sun/security/krb5/auto/Unreachable.java (only on IPv6) >> >> Thank you and best regards, >> Volker >> >> >> Following a detailed description of the various changes: >> >> >> src/share/native/java/util/zip/zip_util.c >> src/share/native/sun/management/DiagnosticCommandImpl.c >> >> * According to ISO C it is perfectly legal for malloc to return zero >> >> if called with a zero argument. Fix various places where malloc can >> potentially correctly return zero because it was called with a zero >> argument. >> * Also fixed |DiagnosticCommandImpl.c| to include |stdlib.h|. This >> >> only fixes a compiler warning on Linux, but on AIX it prevents a VM >> crash later on because the return value of |malloc()| will be casted >> to |int| which is especially bad if that pointer was bigger than >> 32-bit. >> >> >> make/CompileJavaClasses.gmk >> >> * Also use |PollingWatchService| on AIX. >> >> >> make/lib/NioLibraries.gmk >> src/aix/native/sun/nio/ch/AixNativeThread.c >> >> * Put the implementation for the native methods of |NativeThread| into >> >> |AixNativeThread.c| on AIX. >> >> >> src/solaris/native/sun/nio/ch/PollArrayWrapper.c >> src/solaris/native/sun/nio/ch/Net.c >> src/aix/classes/sun/nio/ch/AixPollPort.java >> src/aix/native/sun/nio/ch/AixPollPort.c >> src/aix/native/java/net/aix_close.c >> >> * On AIX, the constants used for the polling events (i.e. |POLLIN|, >> |POLLOUT|, ...) are defined to different values than on other >> >> operating systems. The problem is however, that these constants are >> hardcoded as public final static members of various, shared Java >> classes. We therefore have to map them from Java to native every >> time before calling one of the native poll functions and back to >> Java after the call on AIX in order to get the right semantics. >> >> >> src/share/classes/java/nio/file/CopyMoveHelper.java >> >> * As discussed on the core-libs mailing list (see >> >> http://mail.openjdk.java.net/pipermail/core-libs-dev/2013- >> December/024119.html) >> it is not necessary to call |Files.getFileAttributeView()| with any >> |linkOptions| because at that place we've already checked that the >> target file can not be a symbolic link. This change makes the >> implementation more robust on platforms which support symbolic links >> but do not support the |O_NOFOLLOW| flag to the |open| system call. >> It also makes the JDK pass the |demo/zipfs/basic.sh| test on AIX. >> >> >> src/share/classes/sun/nio/cs/ext/ExtendedCharsets.java >> >> * Support "compound text" on AIX in the same way like on other Unix >> platforms. >> >> >> src/share/classes/sun/tools/attach/META-INF/services/com. >> sun.tools.attach.spi.AttachProvider >> >> * Define the correct attach provider for AIX. >> >> >> src/solaris/native/java/net/net_util_md.h >> src/solaris/native/sun/nio/ch/FileDispatcherImpl.c >> src/solaris/native/sun/nio/ch/ServerSocketChannelImpl.c >> >> * AIX needs a workaround for I/O cancellation (see: >> >> http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index. >> jsp?topic=/com.ibm.aix.basetechref/doc/basetrf1/close.htm). >> "..The |close()| subroutine is blocked until all subroutines which >> use the file descriptor return to usr space. For example, when a >> thread is calling close and another thread is calling select with >> the same file descriptor, the close subroutine does not return until >> the select call returns...". To fix this problem, we have to use the >> various |NET_| wrappers which are declared in |net_util_md.h| and >> defined in |aix_close.c| and we also need some additional wrappers >> for |fcntl()|, |read()| and |write()| on AIX. >> >> While the current solution isn't really nice because it introduces >> some more AIX-specifc sections in shared code, I think it is the >> best way to go for JDK 8 because it imposes the smallest possible >> changes and risks for the existing platforms. I'm ready to change >> the code to unconditionally use the wrappers for all platforms and >> implement the wrappers empty on platforms which don't need any >> wrapping. I think it would also be nice to clean up the names (e.g. >> |NET_Read()| is currently a wrapper for |recv()| and the |NET_| >> prefix is probably not appropriate any more so maybe change it to >> something like |IO_|). But again, I'll prefer to keep that as a >> >> follow up change for JDK9. >> * Calling |fsync()| on a "read-only" file descriptor on AIX will >> >> result in an error (i.e. "EBADF: The FileDescriptor parameter is not >> a valid file descriptor open for writing."). To prevent this error >> we have to query if the corresponding file descriptor is writeable. >> Notice that at that point we can not access the |writable| attribute >> of the corresponding file channel so we have to use |fcntl()|. >> >> >> src/solaris/classes/java/lang/UNIXProcess.java.aix >> >> * On AIX the implementation is especially tricky, because the >> >> |close()| system call will block if another thread is at the same >> time blocked in a file operation (e.g. 'read()') on the same file >> descriptor. We therefore combine the AIX |ProcessPipeInputStream| >> implemenatation with the |DeferredCloseInputStream| approach used on >> Solaris (see |UNIXProcess.java.solaris|). This means that every >> >> potentially blocking operation on the file descriptor increments a >> counter before it is executed and decrements it once it finishes. >> The 'close()' operation will only be executed if there are no >> pending operations. Otherwise it is deferred after the last pending >> operation has finished. >> >> >> src/share/transport/socket/socketTransport.c >> >> * On AIX we have to call |shutdown()| on a socket descriptor before >> >> closing it, otherwise the |close()| call may be blocked. This is the >> same problem as described before. Unfortunately the JDI framework >> doesn't use the same IO wrappers like other class library components >> so we can not easily use the |NET_| abstractions from |aix_close.c| >> here. >> * Without this small change all JDI regression tests will fail on AIX >> >> because of the way how the tests act as a "debugger" which launches >> another VM (the "debugge") which connects itself back to the >> debugger. In this scenario the "debugge" can not shut down itself >> because one thread will always be blocked in the |close()| call on >> one of the communication sockets. >> >> >> src/solaris/native/java/net/NetworkInterface.c >> >> * Set the scope identifier for IPv6 addresses on AIX. >> >> >> src/solaris/native/java/net/net_util_md.c >> >> * It turns out that we do not always have to replace |SO_REUSEADDR| on >> AIX by |SO_REUSEPORT|. Instead we can simply use the same approach >> >> like BSD and only use |SO_REUSEPORT| additionally, if several >> datagram sockets try to bind to the same port. >> * Also fixed a comment and removed unused local variables. >> * Fixed the obviously inverted assignment |newTime = prevTime;| which >> >> should read |prevTime = newTime;|. Otherwise |prevTime| will never >> change and the timeout will be potential reached too fast. >> >> >> src/solaris/native/sun/management/OperatingSystemImpl.c >> >> * AIX does not understand |/proc/self| so we have to query the real >> >> process ID to access the proc file system. >> >> >> src/solaris/native/sun/nio/ch/DatagramChannelImpl.c >> >> * On AIX, |connect()| may legally return |EAFNOSUPPORT| if called on a >> >> socket with the address family set to |AF_UNSPEC|. >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140114/e6f3ed73/attachment.html From volker.simonis at gmail.com Tue Jan 14 08:57:48 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 14 Jan 2014 17:57:48 +0100 Subject: RFR(S/L): 8028537: PPC64: Updated the JDK regression tests to run on AIX Message-ID: Hi, could you please review the following change: http://cr.openjdk.java.net/~simonis/webrevs/8028537/ which, together with the changes from "8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests" and "8031134 : PPC64: implement printing on AIX" enables our our port to pass all but the following 7 jtreg regression tests on AIX (compared to the Linux/x86_64 baseline from www.java.net/download/jdk8/testresults/testresults.html?): java/net/Inet6Address/B6558853.java java/nio/channels/AsynchronousChannelGroup/Basic.java (sporadically) java/nio/channels/AsynchronousChannelGroup/GroupOfOne.java java/nio/channels/AsynchronousChannelGroup/Unbounded.java (sporadically) java/nio/channels/Selector/RacyDeregister.java sun/security/krb5/auto/Unreachable.java (only on IPv6) The change is only big in the amount of files it touches but rather small in the amount of actual code changes so I flagged it with S/L. Most changes simply add AIX as a known platform to the OS detection sections of the various tests. The are either of the form: case "$OS" in - SunOS | Linux | Darwin ) + SunOS | Linux | Darwin | AIX ) PATHSEP=":" or: isSolaris=true ;; + AIX ) + OS="AIX" + isAIX=true + ;; Windows* ) OS="Windows" The following explanations only mention test with changes different to the ones above: test/ProblemList.txt - Added three tests which currently don't work on AIX. test/com/sun/java/swing/plaf/windows/8016551/bug8016551.java - This test calls JFrame.setDefaultCloseOperation() which is not allowed under the security manager which is active if the test are running in agentvm mode. So better always run this test in othervm mode. test/com/sun/jdi/PrivateTransportTest.sh - On AIX, we have to use LIBPATH instead of LD_LIBRARY_PATH. test/com/sun/nio/sctp/SctpChannel/Util.java test/com/sun/nio/sctp/SctpMultiChannel/Util.java test/com/sun/nio/sctp/SctpServerChannel/Util.java - On AIX, we currently haven't implemented SCTP but we nevertheless compile the shared SCTP classes into the runtime class library. This way the AIX JDK can at least compile SCTP applications altough it can not run them. To support this scenario, the runtime check for the availability of SCTP has to be extended to catch UnsatisfiedLinkError and NoClassDefFoundError. UnsatisfiedLinkError will be thrown the first time when the class SctpChannelImpl will be loaded because it cannot load the its native support library in the static initialisation section. On the next load attempt of the class, a NoClassDefFoundError will be thrown because of the previously failed initialisation. test/java/net/DatagramSocket/Send12k.java - AIX throws an IOException: Message too long if the message is too long. DatagramSocket.send() is specified to throw an IOException so better don't be too specific in the catch clause. test/java/nio/file/Files/SBC.java - AIX actually supports symbolic links, but it does not support NOFOLLOW_LINKS, or more exactly the O_NOFOLLOW flag to the open() system call which is tested here. test/java/nio/file/Files/walkFileTree/find.sh - On AIX find -follow may core dump on recursive links without '-L' (see http://www-01.ibm.com/support/docview.wss?uid=isg1IV28143). test/java/util/logging/AnonLoggerWeakRefLeak.sh test/java/util/logging/LoggerWeakRefLeak.sh - Only treat missing jmap options as warnings and not as errors. test/sun/rmi/rmic/newrmic/equivalence/batch.sh - On AIX the diff utility doesn't detect binary files and thus outputs the full diff of different class files which makes the test fail. Because the generated class files aren?t used or checked anyway we can completely omit the generation of class files by always using the -nowrite option. - Also reformatted the command lines to make the differences more apparent. test/tools/launcher/ExecutionEnvironment.java - On AIX, we have to use LIBPATH instead of LD_LIBRARY_PATH. - AIX does not support the -rpath linker options so the launchers have to prepend the jdk library path to LIBPATH. test/tools/launcher/Settings.java - On PPC64 we need a bigger stack size. Thank you and best regards, Volker -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140114/0c97a9bc/attachment-0001.html From Alan.Bateman at oracle.com Tue Jan 14 11:13:25 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Tue, 14 Jan 2014 19:13:25 +0000 Subject: RFR(S/L): 8028537: PPC64: Updated the JDK regression tests to run on AIX In-Reply-To: References: Message-ID: <52D58C55.6070004@oracle.com> On 14/01/2014 16:57, Volker Simonis wrote: > : > > test/com/sun/nio/sctp/SctpChannel/Util.java > test/com/sun/nio/sctp/SctpMultiChannel/Util.java > test/com/sun/nio/sctp/SctpServerChannel/Util.java > > - On AIX, we currently haven't implemented SCTP but we nevertheless > compile the shared SCTP classes into the runtime class library. This way > the AIX JDK can at least compile SCTP applications altough it can not run > them. To support this scenario, the runtime check for the availability of > SCTP has to be extended to catch UnsatisfiedLinkError and > NoClassDefFoundError. UnsatisfiedLinkError will be thrown the first time > when the class SctpChannelImpl will be loaded because it cannot load the > its native support library in the static initialisation section. On the > next load attempt of the class, a NoClassDefFoundError will be thrown > because of the previously failed initialisation. > OS X has the same issue and the solution used there are stub implementations that just throw UOE. Details in jdk/src/macosx/classes/sun/nio/ch/sctp and that maybe that would work for AIX too. -Alan. From david.holmes at oracle.com Tue Jan 14 16:55:28 2014 From: david.holmes at oracle.com (David Holmes) Date: Wed, 15 Jan 2014 10:55:28 +1000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293F087.2080700@oracle.com> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> Message-ID: <52D5DC80.1040003@oracle.com> Hi Goetz, Sorry for the delay in getting back to this. The general changes to the volatile barriers to support IRIW are okay. The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more specifically it is not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of the commentary excessive, particularly for shared code. In particular the IRIW example in parse3.cpp - it seems a strange place to give the explanation and I don't think we need it to that level of detail. Seems to me that is present is globalDefinitions_ppc.hpp is quite adequate. The changes related to volatile writes in the constructor, as discussed are not required by the Java Memory Model. If you want to keep these then I think they should all be guarded with PPC64 because it is not related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the PPC64 porters. Thanks, David On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: > Hi, > > I updated this webrev. I detected a small flaw I made when editing this version. > The #endif in line 322, parse3.cpp was in the wrong line. > I also based the webrev on the latest version of the stage repo. > http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ > > Best regards, > Goetz. > > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Freitag, 20. Dezember 2013 13:47 > To: David Holmes > Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' > Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > Hi David, > >> So we can at least undo #4 now we have established those tests were not >> required to pass. > We would prefer if we could keep this in. We want to avoid that it's > blamed on the VM if java programs are failing on PPC after they worked > on x86. To clearly mark it as overfulfilling the spec I would guard it by > a flag as proposed. But if you insist I will remove it. Also, this part is > not that performance relevant. > >> A compile-time guard (ifdef) would be better than a runtime one I think > I added a compile-time guard in this new webrev: > http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ > I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces > several double negations I don't like, (#ifNdef CPU_NOT_MULTIPLE_COPY_ATOMIC) > but this way I only have to change the ppc platform. > > Best regards, > Goetz > > P.S.: I will also be available over the Christmas period. > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Freitag, 20. Dezember 2013 05:58 > To: Lindenmaier, Goetz > Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > Sorry for the delay, it takes a while to catch up after two weeks > vacation :) Next vacation (ie next two weeks) I'll continue to check emails. > > On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >> Hi, >> >> ok, I understand the tests are wrong. It's good this issue is settled. >> Thanks Aleksey and Andreas for going into the details of the proof! >> >> About our change: David, the causality is the other way round. >> The change is about IRIW. >> 1. To pass IRIW, we must use sync instructions before loads. > > This is the part I still have some question marks over as the > implications are not nice for performance on non-TSO platforms. But I'm > no further along in processing that paper I'm afraid. > >> 2. If we do syncs before loads, we don't need to do them after stores. >> 3. If we don't do them after stores, we fail the volatile constructor tests. >> 4. So finally we added them again at the end of the constructor after stores >> to pass the volatile constructor tests. > > So we can at least undo #4 now we have established those tests were not > required to pass. > >> We originally passed the constructor tests because the ppc memory order >> instructions are not as find-granular as the >> operations in the IR. MemBarVolatile is specified as StoreLoad. The only instruction >> on PPC that does StoreLoad is sync. But sync also does StoreStore, therefore the >> MemBarVolatile after the store fixes the constructor tests. The proper representation >> of the fix in the IR would be adding a MemBarStoreStore. But now it's pointless >> anyways. >> >>> I'm not happy with the ifdef approach but I won't block it. >> I'd be happy to add a property >> OrderAccess::cpu_is_multiple_copy_atomic() > > A compile-time guard (ifdef) would be better than a runtime one I think > - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic > based not architecture based) as that will allows for turning this > on/off for any architecture for testing purposes. > > Thanks, > David > >> or the like to guard the customization. I'd like that much better. Or also >> OrderAccess::needs_support_iriw_ordering() >> VM_Version::needs_support_iriw_ordering() >> >> >> Best regards, >> Goetz. >> >> >> >> >> >> >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Donnerstag, 28. November 2013 00:34 >> To: Lindenmaier, Goetz >> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> TL;DR version: >> >> Discussion on the c-i list has now confirmed that a constructor-barrier >> for volatiles is not required as part of the JMM specification. It *may* >> be required in an implementation that doesn't pre-zero memory to ensure >> you can't see uninitialized fields. So the tests for this are invalid >> and this part of the patch is not needed in general (ppc64 may need it >> due to other factors). >> >> Re: "multiple copy atomicity" - first thanks for correcting the term :) >> Second thanks for the reference to that paper! For reference: >> >> "The memory system (perhaps involving a hierarchy of buffers and a >> complex interconnect) does not guarantee that a write becomes visible to >> all other hardware threads at the same time point; these architectures >> are not multiple-copy atomic." >> >> This is the visibility issue that I referred to and affects both ARM and >> PPC. But of course it is normally handled by using suitable barriers >> after the stores that need to be visible. I think the crux of the >> current issue is what you wrote below: >> >> > The fixes for the constructor issue are only needed because we >> > remove the sync instruction from behind stores (parse3.cpp:320) >> > and place it before loads. >> >> I hadn't grasped this part. Obviously if you fail to do the sync after >> the store then you have to do something around the loads to get the same >> results! I still don't know what lead you to the conclusion that the >> only way to fix the IRIW issue was to put the fence before the load - >> maybe when I get the chance to read that paper in full it will be clearer. >> >> So ... the basic problem is that the current structure in the VM has >> hard-wired one choice of how to get the right semantics for volatile >> variables. You now want to customize that but not all the requisite >> hooks are present. It would be better if volatile_load and >> volatile_store were factored out so that they could be implemented as >> desired per-platform. Alternatively there could be pre- and post- hooks >> that could then be customized per platform. Otherwise you need >> platform-specific ifdef's to handle it as per your patch. >> >> I'm not happy with the ifdef approach but I won't block it. I think this >> is an area where a lot of clean up is needed in the VM. The barrier >> abstractions are a confused mess in my opinion. >> >> Thanks, >> David >> ----- >> >> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>> Hi, >>> >>> I updated the webrev to fix the issues mentioned by Vladimir: >>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>> >>> I did not yet add the >>> OrderAccess::needs_support_iriw_ordering() >>> VM_Version::needs_support_iriw_ordering() >>> or >>> OrderAccess::cpu_is_multiple_copy_atomic() >>> to reduce #defined, as I got no further comment on that. >>> >>> >>> WRT to the validity of the tests and the interpretation of the JMM >>> I feel not in the position to contribute substantially. >>> >>> But we would like to pass the torture test suite as we consider >>> this a substantial task in implementing a PPC port. Also we think >>> both tests show behavior a programmer would expect. It's bad if >>> Java code runs fine on the more common x86 platform, and then >>> fails on ppc. This will always first be blamed on the VM. >>> >>> The fixes for the constructor issue are only needed because we >>> remove the sync instruction from behind stores (parse3.cpp:320) >>> and place it before loads. Then there is no sync between volatile store >>> and publishing the object. So we add it again in this one case >>> (volatile store in constructor). >>> >>> >>> @David >>>>> Sure. There also is no solution as you require for the taskqueue problem yet, >>>>> and that's being discussed now for almost a year. >>>> It may have started a year ago but work on it has hardly been continuous. >>> That's not true, we did a lot of investigation and testing on this issue. >>> And we came up with a solution we consider the best possible. If you >>> have objections, you should at least give the draft of a better solution, >>> we would volunteer to implement and test it. >>> Similarly, we invested time in fixing the concurrency torture issues. >>> >>> @David >>>> What is "multiple-read-atomicity"? I'm not familiar with the term and >>>> can't find any reference to it. >>> We learned about this reading "A Tutorial Introduction to the ARM and >>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>> Peter Sewell, which is cited in "Correct and Efficient Work-Stealing for >>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the taskqueue problem. >>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>> >>> I was wrong in one thing, it's called multiple copy atomicity, I used 'read' >>> instead. Sorry for that. (I also fixed that in the method name above). >>> >>> Best regards and thanks for all your involvements, >>> Goetz. >>> >>> >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Mittwoch, 27. November 2013 12:53 >>> To: Lindenmaier, Goetz >>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>> >>> Hi Goetz, >>> >>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>> Hi David, >>>> >>>> -- Volatile in constuctor >>>>> AFAIK we have not seen those tests fail due to a >>>>> missing constructor barrier. >>>> We see them on PPC64. Our test machines have typically 8-32 processors >>>> and are Power 5-7. But see also Aleksey's mail. (Thanks Aleksey!) >>> >>> And see follow ups - the tests are invalid. >>> >>>> -- IRIW issue >>>>> I can not possibly answer to the necessary level of detail with a few >>>>> moments thought. >>>> Sure. There also is no solution as you require for the taskqueue problem yet, >>>> and that's being discussed now for almost a year. >>> >>> It may have started a year ago but work on it has hardly been continuous. >>> >>>>> You are implying there is a problem here that will >>>>> impact numerous platforms (unless you can tell me why ppc is so different?) >>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore I contributed a >>>> solution with the #defines, and that's correct for all, but not nice, I admit. >>>> (I don't really know about ARM, though). >>>> So if I can write down a nicer solution testing for methods that are evaluated >>>> by the C-compiler I'm happy. >>>> >>>> The problem is not that IRIW is not handled by the JMM, the problem >>>> is that >>>> store >>>> sync >>>> does not assure multiple-read-atomicity, >>>> only >>>> sync >>>> load >>>> does so on PPC. And you require multiple-read-atomicity to >>>> pass that test. >>> >>> What is "multiple-read-atomicity"? I'm not familiar with the term and >>> can't find any reference to it. >>> >>> Thanks, >>> David >>> >>> The JMM is fine. And >>>> store >>>> MemBarVolatile >>>> is fine on x86, sparc etc. as there exist assembler instructions that >>>> do what is required. >>>> >>>> So if you are off soon, please let's come to a solution that >>>> might be improvable in the way it's implemented, but that >>>> allows us to implement a correct PPC64 port. >>>> >>>> Best regards, >>>> Goetz. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>> To: Lindenmaier, Goetz >>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>> >>>> Hi Goetz, >>>> >>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>> Hi everybody, >>>>> >>>>> thanks a lot for the detailed reviews! >>>>> I'll try to answer to all in one mail. >>>>> >>>>>> Volatile fields written in constructor aren't guaranteed by JMM to occur before the reference is assigned; >>>>> We don't think it's correct if we omit the barrier after initializing >>>>> a volatile field. Previously, we discussed this with Aleksey Shipilev >>>>> and Doug Lea, and they agreed. >>>>> Also, concurrency torture tests >>>>> LongVolatileTest >>>>> AtomicIntegerInitialValueTest >>>>> will fail. >>>>> (In addition, observing 0 instead of the inital value of a volatile field would be >>>>> very counter-intuitive for Java programmers, especially in AtomicInteger.) >>>> >>>> The affects of unsafe publication are always surprising - volatiles do >>>> not add anything special here. AFAIK there is nothing in the JMM that >>>> requires the constructor barrier - discussions with Doug and Aleksey >>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>> missing constructor barrier. >>>> >>>>>> proposed for PPC64 is to make volatile reads extremely heavyweight >>>>> Yes, it costs measurable performance. But else it is wrong. We don't >>>>> see a way to implement this cheaper. >>>>> >>>>>> - these algorithms should be expressed using the correct OrderAccess operations >>>>> Basically, I agree on this. But you also have to take into account >>>>> that due to the different memory ordering instructions on different platforms >>>>> just implementing something empty is not sufficient. >>>>> An example: >>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>> MemBarVolatile // means StoreLoad barrier >>>>> If these are consecutively in the code, sparc code looks like this: >>>>> MemBarRelease --> membar(Assembler::LoadStore | Assembler::StoreStore) >>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>> Just doing what is required. >>>>> On Power, we get suboptimal code, as there are no comparable, >>>>> fine grained operations: >>>>> MemBarRelease --> lwsync // Doing LoadStore, StoreStore, LoadLoad >>>>> MemBarVolatile --> sync // // Doing LoadStore, StoreStore, LoadLoad, StoreLoad >>>>> obviously, the lwsync is superfluous. Thus, as PPC operations are more (too) powerful, >>>>> I need an additional optimization that removes the lwsync. I can not implement >>>>> MemBarRelease empty, as it is also used independently. >>>>> >>>>> Back to the IRIW problem. I think here we have a comparable issue. >>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the read >>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>> >>>>> I would propose to guard the code by >>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>> Else, David, how would you propose to implement this platform independent? >>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>> >>>> I can not possibly answer to the necessary level of detail with a few >>>> moments thought. You are implying there is a problem here that will >>>> impact numerous platforms (unless you can tell me why ppc is so >>>> different?) and I can not take that on face value at the moment. The >>>> only reason I can see IRIW not being handled by the JMM requirements for >>>> volatile accesses is if there are global visibility issues that are not >>>> addressed - but even then I would expect heavy barriers at the store >>>> would deal with that, not at the load. (This situation reminds me of the >>>> need for read-barriers on Alpha architecture due to the use of software >>>> cache-coherency rather than hardware cache-coherency - but we don't have >>>> that on ppc!) >>>> >>>> Sorry - There is no quick resolution here and in a couple of days I will >>>> be heading out on vacation for two weeks. >>>> >>>> David >>>> ----- >>>> >>>>> Best regards, >>>>> Goetz. >>>>> >>>>> -- Other ports: >>>>> The IRIW issue requires at least 3 processors to be relevant, so it might >>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>> of PPC64_ONLY if you request so (and if we don't get rid of them). >>>>> >>>>> -- MemBarStoreStore after initialization >>>>> I agree we should not change it in the ppc port. If you wish, I can >>>>> prepare an extra webrev for hotspot-comp. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>> To: Vladimir Kozlov >>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>>> >>>>> Okay this is my second attempt at answering this in a reasonable way :) >>>>> >>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>> I have to ask David to do correctness evaluation. >>>>> >>>>> From what I understand what we see here is an attempt to fix an >>>>> existing issue with the implementation of volatiles so that the IRIW >>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>> volatile reads extremely heavyweight by adding a fence() when doing the >>>>> load. >>>>> >>>>> Now if this was purely handled in ppc64 source code then I would be >>>>> happy to let them do whatever they like (surely this kills performance >>>>> though!). But I do not agree with the changes to the shared code that >>>>> allow this solution to be implemented - even with PPC64_ONLY this is >>>>> polluting the shared code. My concern is similar to what I said with the >>>>> taskQueue changes - these algorithms should be expressed using the >>>>> correct OrderAccess operations to guarantee the desired properties >>>>> independent of architecture. If such a "barrier" is not needed on a >>>>> given architecture then the implementation in OrderAccess should reduce >>>>> to a no-op. >>>>> >>>>> And as Vitaly points out the constructor barriers are not needed under >>>>> the JMM. >>>>> >>>>>> I am fine with suggested changes because you did not change our current >>>>>> code for our platforms (please, do not change do_exits() now). >>>>>> But may be it should be done using more general query which is set >>>>>> depending on platform: >>>>>> >>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>> >>>>>> or similar to what we use now: >>>>>> >>>>>> VM_Version::needs_support_iriw_ordering() >>>>> >>>>> Every platform has to support IRIW this is simply part of the Java >>>>> Memory Model, there should not be any need to call this out explicitly >>>>> like this. >>>>> >>>>> Is there some subtlety of the hardware I am missing here? Are there >>>>> visibility issues beyond the ordering constraints that the JMM defines? >>>>>> From what I understand our ppc port is also affected. David? >>>>> >>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> In library_call.cpp can you add {}? New comment should be inside else {}. >>>>>> >>>>>> I think you should make _wrote_volatile field not ppc64 specific which >>>>>> will be set to 'true' only on ppc64. Then you will not need PPC64_ONLY() >>>>>> except in do_put_xxx() where it is set to true. Too many #ifdefs. >>>>>> >>>>>> In do_put_xxx() can you combine your changes: >>>>>> >>>>>> if (is_vol) { >>>>>> // See comment in do_get_xxx(). >>>>>> #ifndef PPC64 >>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>> #else >>>>>> if (is_field) { >>>>>> // Add MemBarRelease for constructors which write volatile field >>>>>> (PPC64). >>>>>> set_wrote_volatile(true); >>>>>> } >>>>>> #endif >>>>>> } >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I preprared a webrev with fixes for PPC for the VolatileIRIWTest of >>>>>>> the torture test suite: >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>> >>>>>>> Example: >>>>>>> volatile x=0, y=0 >>>>>>> __________ __________ __________ __________ >>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>> >>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>> read(y) read(x) >>>>>>> >>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>> >>>>>>> >>>>>>> Solution: This example requires multiple-copy-atomicity. This is only >>>>>>> assured by the sync instruction and if it is executed in the threads >>>>>>> doing the loads. Thus we implement volatile read as sync-load-acquire >>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>> >>>>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs >>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>> >>>>>>> Further this change contains a fix that assures that volatile fields >>>>>>> written in constructors are visible before the reference gets >>>>>>> published. >>>>>>> >>>>>>> >>>>>>> Looking at the code, we found a MemBarRelease that to us, seems too >>>>>>> strong. >>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should suffice. >>>>>>> What do you think? >>>>>>> >>>>>>> Please review and test this change. >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz. >>>>>>> From david.holmes at oracle.com Tue Jan 14 22:24:32 2014 From: david.holmes at oracle.com (David Holmes) Date: Wed, 15 Jan 2014 16:24:32 +1000 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: <52D51FAC.8060800@oracle.com> Message-ID: <52D629A0.4080806@oracle.com> On 15/01/2014 12:10 AM, Volker Simonis wrote: > On Tue, Jan 14, 2014 at 12:29 PM, David Holmes > wrote: > > Just a note on this part (I havent looked at the code): > > > On AIX, the constants used for the polling events (i.e. POLLIN, > POLLOUT, ...) are defined to different values than on other > operating systems. The problem is however, that these constants > are hardcoded as public final static members of various, shared > Java classes. > > > Sounds like this should be handled the same way that the other > "system constants" are handled - you can either store a platform > file in the repo (for cross-compiling) or you generate the class > containing the constants at build time. > > > Hi David, > > thanks for your comments. That sound like a good idea but I'm not sure > if it would make sense to duplicate the following files: > > src/share/classes/sun/nio/ch/AbstractPollArrayWrapper.java: > src/solaris/classes/sun/nio/ch/Port.java > > because of this. Do you have a concrete example where Java-classes are > being generated with different constants in the class library build? There are two files generated: UnixConstants.java (or SolarisConstants.java) for general I/O type values SocketOptionRegistry.java for socket options. See jdk/make/gensrc/GensrcMisc.gmk. > Both solutions would result in different class files on Aix and other > Unix variants. What do you think about assigning the concrete values > depending on 'os.name ' in the static initializers of > the corresponding classes? I think that shouldn't introduce too much > overhead and I could get rid of all the ugly conversion code. I'm not a fan of runtime checks of this kind though if it is only a very samll number of values it might be okay. Another option would be to make those classes into "templates" as done with Version.java.template and substitute the right values at build time. But I'll let Alan and net-dev folk come back with their preferred technique for this. Cheers, David > > Regards, > Volker > > David > > > On 14/01/2014 6:40 PM, Volker Simonis wrote: > > Hi, > > could you please review the following changes for the ppc-aix-port > stage/stage-9 repositories (the changes are planned for > integration into > ppc-aix-port/stage-9 and subsequent backporting to > ppc-aix-port/stage): > > http://cr.openjdk.java.net/~__simonis/webrevs/8031581/ > > > I've build and smoke tested without any problems on Linux/x86_64 and > PPC64, Windows/x86_64, MacOSX, Solaris/SPARC64 and AIX7PPC64. > > With these changes (and together with the changes from "8028537: > PPC64: > Updated jdk/test scripts to understand the AIX os and > environment" and > "8031134 : PPC64: implement printing on AIX") our port passes > all but > the following 7 jtreg regression tests on AIX (compared to the > Linux/x86_64 baseline from > www.java.net/download/jdk8/__testresults/testresults.html > > >?): > > > java/net/Inet6Address/__B6558853.java > java/nio/channels/__AsynchronousChannelGroup/__Basic.java > (sporadically) > java/nio/channels/__AsynchronousChannelGroup/__GroupOfOne.java > java/nio/channels/__AsynchronousChannelGroup/__Unbounded.java > (sporadically) > java/nio/channels/Selector/__RacyDeregister.java > sun/security/krb5/auto/__Unreachable.java (only on IPv6) > > Thank you and best regards, > Volker > > > Following a detailed description of the various changes: > > > src/share/native/java/util/__zip/zip_util.c > src/share/native/sun/__management/__DiagnosticCommandImpl.c > > * According to ISO C it is perfectly legal for malloc to > return zero > > if called with a zero argument. Fix various places where > malloc can > potentially correctly return zero because it was called > with a zero > argument. > * Also fixed |DiagnosticCommandImpl.c| to include |stdlib.h|. > This > > only fixes a compiler warning on Linux, but on AIX it > prevents a VM > crash later on because the return value of |malloc()| will > be casted > to |int| which is especially bad if that pointer was bigger > than > 32-bit. > > > make/CompileJavaClasses.gmk > > * Also use |PollingWatchService| on AIX. > > > make/lib/NioLibraries.gmk > src/aix/native/sun/nio/ch/__AixNativeThread.c > > * Put the implementation for the native methods of > |NativeThread| into > > |AixNativeThread.c| on AIX. > > > src/solaris/native/sun/nio/ch/__PollArrayWrapper.c > src/solaris/native/sun/nio/ch/__Net.c > src/aix/classes/sun/nio/ch/__AixPollPort.java > src/aix/native/sun/nio/ch/__AixPollPort.c > src/aix/native/java/net/aix___close.c > > * On AIX, the constants used for the polling events (i.e. > |POLLIN|, > |POLLOUT|, ...) are defined to different values than on other > > operating systems. The problem is however, that these > constants are > hardcoded as public final static members of various, shared > Java > classes. We therefore have to map them from Java to native > every > time before calling one of the native poll functions and > back to > Java after the call on AIX in order to get the right semantics. > > > src/share/classes/java/nio/__file/CopyMoveHelper.java > > * As discussed on the core-libs mailing list (see > > http://mail.openjdk.java.net/__pipermail/core-libs-dev/2013-__December/024119.html > ) > it is not necessary to call |Files.getFileAttributeView()| > with any > |linkOptions| because at that place we've already checked > that the > target file can not be a symbolic link. This change makes the > implementation more robust on platforms which support > symbolic links > but do not support the |O_NOFOLLOW| flag to the |open| > system call. > It also makes the JDK pass the |demo/zipfs/basic.sh| test > on AIX. > > > src/share/classes/sun/nio/cs/__ext/ExtendedCharsets.java > > * Support "compound text" on AIX in the same way like on > other Unix > platforms. > > > > src/share/classes/sun/tools/__attach/META-INF/services/com.__sun.tools.attach.spi.__AttachProvider > > * Define the correct attach provider for AIX. > > > src/solaris/native/java/net/__net_util_md.h > src/solaris/native/sun/nio/ch/__FileDispatcherImpl.c > src/solaris/native/sun/nio/ch/__ServerSocketChannelImpl.c > > * AIX needs a workaround for I/O cancellation (see: > > http://publib.boulder.ibm.com/__infocenter/pseries/v5r3/index.__jsp?topic=/com.ibm.aix.__basetechref/doc/basetrf1/__close.htm > ). > "..The |close()| subroutine is blocked until all > subroutines which > use the file descriptor return to usr space. For example, > when a > thread is calling close and another thread is calling > select with > the same file descriptor, the close subroutine does not > return until > the select call returns...". To fix this problem, we have > to use the > various |NET_| wrappers which are declared in > |net_util_md.h| and > defined in |aix_close.c| and we also need some additional > wrappers > for |fcntl()|, |read()| and |write()| on AIX. > > While the current solution isn't really nice because it > introduces > some more AIX-specifc sections in shared code, I think it > is the > best way to go for JDK 8 because it imposes the smallest > possible > changes and risks for the existing platforms. I'm ready to > change > the code to unconditionally use the wrappers for all > platforms and > implement the wrappers empty on platforms which don't need any > wrapping. I think it would also be nice to clean up the > names (e.g. > |NET_Read()| is currently a wrapper for |recv()| and the |NET_| > prefix is probably not appropriate any more so maybe change > it to > something like |IO_|). But again, I'll prefer to keep that as a > > follow up change for JDK9. > * Calling |fsync()| on a "read-only" file descriptor on AIX will > > result in an error (i.e. "EBADF: The FileDescriptor > parameter is not > a valid file descriptor open for writing."). To prevent > this error > we have to query if the corresponding file descriptor is > writeable. > Notice that at that point we can not access the |writable| > attribute > of the corresponding file channel so we have to use |fcntl()|. > > > src/solaris/classes/java/lang/__UNIXProcess.java.aix > > * On AIX the implementation is especially tricky, because the > > |close()| system call will block if another thread is at > the same > time blocked in a file operation (e.g. 'read()') on the > same file > descriptor. We therefore combine the AIX > |ProcessPipeInputStream| > implemenatation with the |DeferredCloseInputStream| > approach used on > Solaris (see |UNIXProcess.java.solaris|). This means that every > > potentially blocking operation on the file descriptor > increments a > counter before it is executed and decrements it once it > finishes. > The 'close()' operation will only be executed if there are no > pending operations. Otherwise it is deferred after the last > pending > operation has finished. > > > src/share/transport/socket/__socketTransport.c > > * On AIX we have to call |shutdown()| on a socket descriptor > before > > closing it, otherwise the |close()| call may be blocked. > This is the > same problem as described before. Unfortunately the JDI > framework > doesn't use the same IO wrappers like other class library > components > so we can not easily use the |NET_| abstractions from > |aix_close.c| > here. > * Without this small change all JDI regression tests will > fail on AIX > > because of the way how the tests act as a "debugger" which > launches > another VM (the "debugge") which connects itself back to the > debugger. In this scenario the "debugge" can not shut down > itself > because one thread will always be blocked in the |close()| > call on > one of the communication sockets. > > > src/solaris/native/java/net/__NetworkInterface.c > > * Set the scope identifier for IPv6 addresses on AIX. > > > src/solaris/native/java/net/__net_util_md.c > > * It turns out that we do not always have to replace > |SO_REUSEADDR| on > AIX by |SO_REUSEPORT|. Instead we can simply use the same > approach > > like BSD and only use |SO_REUSEPORT| additionally, if several > datagram sockets try to bind to the same port. > * Also fixed a comment and removed unused local variables. > * Fixed the obviously inverted assignment |newTime = > prevTime;| which > > should read |prevTime = newTime;|. Otherwise |prevTime| > will never > change and the timeout will be potential reached too fast. > > > src/solaris/native/sun/__management/__OperatingSystemImpl.c > > * AIX does not understand |/proc/self| so we have to query > the real > > process ID to access the proc file system. > > > src/solaris/native/sun/nio/ch/__DatagramChannelImpl.c > > * On AIX, |connect()| may legally return |EAFNOSUPPORT| if > called on a > > socket with the address family set to |AF_UNSPEC|. > > > From staffan.larsen at oracle.com Wed Jan 15 00:57:31 2014 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Wed, 15 Jan 2014 09:57:31 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: Message-ID: <94ABE61E-10BC-407E-92D0-B528165F3460@oracle.com> Volker, I?ve look at the following files: src/share/native/sun/management/DiagnosticCommandImpl.c: nit: ?legel? -> ?legal? (two times) In Java_sun_management_DiagnosticCommandImpl_getDiagnosticCommandInfo() if you allow dcmd_info_array to become NULL, then jmm_interface->GetDiagnosticCommandInfo() will throw an NPE and you need to check that. src/solaris/native/sun/management/OperatingSystemImpl.c No comments. src/share/transport/socket/socketTransport.c No comments. src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider No comments. Thanks, /Staffan On 14 jan 2014, at 09:40, Volker Simonis wrote: > Hi, > > could you please review the following changes for the ppc-aix-port stage/stage-9 repositories (the changes are planned for integration into ppc-aix-port/stage-9 and subsequent backporting to ppc-aix-port/stage): > > http://cr.openjdk.java.net/~simonis/webrevs/8031581/ > > I've build and smoke tested without any problems on Linux/x86_64 and PPC64, Windows/x86_64, MacOSX, Solaris/SPARC64 and AIX7PPC64. > > With these changes (and together with the changes from "8028537: PPC64: Updated jdk/test scripts to understand the AIX os and environment" and "8031134 : PPC64: implement printing on AIX") our port passes all but the following 7 jtreg regression tests on AIX (compared to the Linux/x86_64 baseline from www.java.net/download/jdk8/testresults/testresults.html?): > > java/net/Inet6Address/B6558853.java > java/nio/channels/AsynchronousChannelGroup/Basic.java (sporadically) > java/nio/channels/AsynchronousChannelGroup/GroupOfOne.java > java/nio/channels/AsynchronousChannelGroup/Unbounded.java (sporadically) > java/nio/channels/Selector/RacyDeregister.java > sun/security/krb5/auto/Unreachable.java (only on IPv6) > > Thank you and best regards, > Volker > > > Following a detailed description of the various changes: > src/share/native/java/util/zip/zip_util.c > src/share/native/sun/management/DiagnosticCommandImpl.c > > According to ISO C it is perfectly legal for malloc to return zero if called with a zero argument. Fix various places where malloc can potentially correctly return zero because it was called with a zero argument. > Also fixed DiagnosticCommandImpl.c to include stdlib.h. This only fixes a compiler warning on Linux, but on AIX it prevents a VM crash later on because the return value of malloc() will be casted to int which is especially bad if that pointer was bigger than 32-bit. > make/CompileJavaClasses.gmk > > Also use PollingWatchService on AIX. > make/lib/NioLibraries.gmk > src/aix/native/sun/nio/ch/AixNativeThread.c > > Put the implementation for the native methods of NativeThread into AixNativeThread.c on AIX. > src/solaris/native/sun/nio/ch/PollArrayWrapper.c > src/solaris/native/sun/nio/ch/Net.c > src/aix/classes/sun/nio/ch/AixPollPort.java > src/aix/native/sun/nio/ch/AixPollPort.c > src/aix/native/java/net/aix_close.c > > On AIX, the constants used for the polling events (i.e. POLLIN, POLLOUT, ...) are defined to different values than on other operating systems. The problem is however, that these constants are hardcoded as public final static members of various, shared Java classes. We therefore have to map them from Java to native every time before calling one of the native poll functions and back to Java after the call on AIX in order to get the right semantics. > src/share/classes/java/nio/file/CopyMoveHelper.java > > As discussed on the core-libs mailing list (see http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-December/024119.html) it is not necessary to call Files.getFileAttributeView() with any linkOptions because at that place we've already checked that the target file can not be a symbolic link. This change makes the implementation more robust on platforms which support symbolic links but do not support the O_NOFOLLOW flag to the open system call. It also makes the JDK pass the demo/zipfs/basic.sh test on AIX. > src/share/classes/sun/nio/cs/ext/ExtendedCharsets.java > > Support "compound text" on AIX in the same way like on other Unix platforms. > src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider > > Define the correct attach provider for AIX. > src/solaris/native/java/net/net_util_md.h > src/solaris/native/sun/nio/ch/FileDispatcherImpl.c > src/solaris/native/sun/nio/ch/ServerSocketChannelImpl.c > > AIX needs a workaround for I/O cancellation (see: http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.basetechref/doc/basetrf1/close.htm). "..The close() subroutine is blocked until all subroutines which use the file descriptor return to usr space. For example, when a thread is calling close and another thread is calling select with the same file descriptor, the close subroutine does not return until the select call returns...". To fix this problem, we have to use the various NET_ wrappers which are declared in net_util_md.h and defined in aix_close.c and we also need some additional wrappers for fcntl(), read() and write() on AIX. > While the current solution isn't really nice because it introduces some more AIX-specifc sections in shared code, I think it is the best way to go for JDK 8 because it imposes the smallest possible changes and risks for the existing platforms. I'm ready to change the code to unconditionally use the wrappers for all platforms and implement the wrappers empty on platforms which don't need any wrapping. I think it would also be nice to clean up the names (e.g. NET_Read() is currently a wrapper for recv() and the NET_ prefix is probably not appropriate any more so maybe change it to something like IO_). But again, I'll prefer to keep that as a follow up change for JDK9. > Calling fsync() on a "read-only" file descriptor on AIX will result in an error (i.e. "EBADF: The FileDescriptor parameter is not a valid file descriptor open for writing."). To prevent this error we have to query if the corresponding file descriptor is writeable. Notice that at that point we can not access the writable attribute of the corresponding file channel so we have to use fcntl(). > src/solaris/classes/java/lang/UNIXProcess.java.aix > > On AIX the implementation is especially tricky, because the close() system call will block if another thread is at the same time blocked in a file operation (e.g. 'read()') on the same file descriptor. We therefore combine the AIX ProcessPipeInputStream implemenatation with the DeferredCloseInputStream approach used on Solaris (see UNIXProcess.java.solaris). This means that every potentially blocking operation on the file descriptor increments a counter before it is executed and decrements it once it finishes. The 'close()' operation will only be executed if there are no pending operations. Otherwise it is deferred after the last pending operation has finished. > src/share/transport/socket/socketTransport.c > > On AIX we have to call shutdown() on a socket descriptor before closing it, otherwise the close() call may be blocked. This is the same problem as described before. Unfortunately the JDI framework doesn't use the same IO wrappers like other class library components so we can not easily use the NET_ abstractions from aix_close.c here. > Without this small change all JDI regression tests will fail on AIX because of the way how the tests act as a "debugger" which launches another VM (the "debugge") which connects itself back to the debugger. In this scenario the "debugge" can not shut down itself because one thread will always be blocked in the close() call on one of the communication sockets. > src/solaris/native/java/net/NetworkInterface.c > > Set the scope identifier for IPv6 addresses on AIX. > src/solaris/native/java/net/net_util_md.c > > It turns out that we do not always have to replace SO_REUSEADDR on AIX by SO_REUSEPORT. Instead we can simply use the same approach like BSD and only use SO_REUSEPORT additionally, if several datagram sockets try to bind to the same port. > Also fixed a comment and removed unused local variables. > Fixed the obviously inverted assignment newTime = prevTime; which should read prevTime = newTime;. Otherwise prevTime will never change and the timeout will be potential reached too fast. > src/solaris/native/sun/management/OperatingSystemImpl.c > > AIX does not understand /proc/self so we have to query the real process ID to access the proc file system. > src/solaris/native/sun/nio/ch/DatagramChannelImpl.c > > On AIX, connect() may legally return EAFNOSUPPORT if called on a socket with the address family set to AF_UNSPEC. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140115/691071b6/attachment.html From Alan.Bateman at oracle.com Wed Jan 15 01:03:17 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 15 Jan 2014 09:03:17 +0000 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: <52D629A0.4080806@oracle.com> References: <52D51FAC.8060800@oracle.com> <52D629A0.4080806@oracle.com> Message-ID: <52D64ED5.4020409@oracle.com> On 15/01/2014 06:24, David Holmes wrote: > > I'm not a fan of runtime checks of this kind though if it is only a > very samll number of values it might be okay. > > Another option would be to make those classes into "templates" as done > with Version.java.template and substitute the right values at build time. > > But I'll let Alan and net-dev folk come back with their preferred > technique for this. > I plan to spend time on Volker's webrev later in the week (just too busy with other things right now). For the translation issue then it's an oversight in the original implementation, it just hasn't come up before (to my knowledge anyway). The simplest solution here maybe to to just move them to sun.net.ch.Net and have them initialized to their native value. In general then I'm not too concerned about that one, it's the changes to support async close on AIX that are leaping out at me. -Alan From volker.simonis at gmail.com Wed Jan 15 03:05:14 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 15 Jan 2014 12:05:14 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: <52D64ED5.4020409@oracle.com> References: <52D51FAC.8060800@oracle.com> <52D629A0.4080806@oracle.com> <52D64ED5.4020409@oracle.com> Message-ID: On Wed, Jan 15, 2014 at 10:03 AM, Alan Bateman wrote: > On 15/01/2014 06:24, David Holmes wrote: > >> >> I'm not a fan of runtime checks of this kind though if it is only a very >> samll number of values it might be okay. >> >> Another option would be to make those classes into "templates" as done >> with Version.java.template and substitute the right values at build time. >> >> But I'll let Alan and net-dev folk come back with their preferred >> technique for this. >> >> I plan to spend time on Volker's webrev later in the week (just too busy > with other things right now). For the translation issue then it's an > oversight in the original implementation, it just hasn't come up before (to > my knowledge anyway). The simplest solution here maybe to to just move them > to sun.net.ch.Net and have them initialized to their native value. Do you mean sun.nio.ch.Net right? Do you propose to completely remove the definitions of the POLL constants from: src/share/classes/sun/nio/ch/AbstractPollArrayWrapper.java src/solaris/classes/sun/nio/ch/Port.java and replace all their usages by Net.POLL* ? > In general then I'm not too concerned about that one, it's the changes to > support async close on AIX that are leaping out at me. > > -Alan > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140115/d0df4591/attachment-0001.html From goetz.lindenmaier at sap.com Wed Jan 15 07:28:15 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 15 Jan 2014 15:28:15 +0000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <52D5DC80.1040003@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293F087.2080700@oracle.com> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> Hi David, I updated the webrev: http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ - I removed the IRIW example in parse3.cpp - I adapted the comments not to point to that comment, and to reflect the new flagging. Also I mention that we support the volatile constructor issue, but that it's not standard. - I protected issuing the barrier for the constructor by PPC64. I also think it's better to separate these this way. Thanks for your comments! Best regards, Goetz. -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Mittwoch, 15. Januar 2014 01:55 To: Lindenmaier, Goetz Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes Hi Goetz, Sorry for the delay in getting back to this. The general changes to the volatile barriers to support IRIW are okay. The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more specifically it is not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of the commentary excessive, particularly for shared code. In particular the IRIW example in parse3.cpp - it seems a strange place to give the explanation and I don't think we need it to that level of detail. Seems to me that is present is globalDefinitions_ppc.hpp is quite adequate. The changes related to volatile writes in the constructor, as discussed are not required by the Java Memory Model. If you want to keep these then I think they should all be guarded with PPC64 because it is not related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the PPC64 porters. Thanks, David On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: > Hi, > > I updated this webrev. I detected a small flaw I made when editing this version. > The #endif in line 322, parse3.cpp was in the wrong line. > I also based the webrev on the latest version of the stage repo. > http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ > > Best regards, > Goetz. > > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Freitag, 20. Dezember 2013 13:47 > To: David Holmes > Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' > Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > Hi David, > >> So we can at least undo #4 now we have established those tests were not >> required to pass. > We would prefer if we could keep this in. We want to avoid that it's > blamed on the VM if java programs are failing on PPC after they worked > on x86. To clearly mark it as overfulfilling the spec I would guard it by > a flag as proposed. But if you insist I will remove it. Also, this part is > not that performance relevant. > >> A compile-time guard (ifdef) would be better than a runtime one I think > I added a compile-time guard in this new webrev: > http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ > I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces > several double negations I don't like, (#ifNdef CPU_NOT_MULTIPLE_COPY_ATOMIC) > but this way I only have to change the ppc platform. > > Best regards, > Goetz > > P.S.: I will also be available over the Christmas period. > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Freitag, 20. Dezember 2013 05:58 > To: Lindenmaier, Goetz > Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > Sorry for the delay, it takes a while to catch up after two weeks > vacation :) Next vacation (ie next two weeks) I'll continue to check emails. > > On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >> Hi, >> >> ok, I understand the tests are wrong. It's good this issue is settled. >> Thanks Aleksey and Andreas for going into the details of the proof! >> >> About our change: David, the causality is the other way round. >> The change is about IRIW. >> 1. To pass IRIW, we must use sync instructions before loads. > > This is the part I still have some question marks over as the > implications are not nice for performance on non-TSO platforms. But I'm > no further along in processing that paper I'm afraid. > >> 2. If we do syncs before loads, we don't need to do them after stores. >> 3. If we don't do them after stores, we fail the volatile constructor tests. >> 4. So finally we added them again at the end of the constructor after stores >> to pass the volatile constructor tests. > > So we can at least undo #4 now we have established those tests were not > required to pass. > >> We originally passed the constructor tests because the ppc memory order >> instructions are not as find-granular as the >> operations in the IR. MemBarVolatile is specified as StoreLoad. The only instruction >> on PPC that does StoreLoad is sync. But sync also does StoreStore, therefore the >> MemBarVolatile after the store fixes the constructor tests. The proper representation >> of the fix in the IR would be adding a MemBarStoreStore. But now it's pointless >> anyways. >> >>> I'm not happy with the ifdef approach but I won't block it. >> I'd be happy to add a property >> OrderAccess::cpu_is_multiple_copy_atomic() > > A compile-time guard (ifdef) would be better than a runtime one I think > - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic > based not architecture based) as that will allows for turning this > on/off for any architecture for testing purposes. > > Thanks, > David > >> or the like to guard the customization. I'd like that much better. Or also >> OrderAccess::needs_support_iriw_ordering() >> VM_Version::needs_support_iriw_ordering() >> >> >> Best regards, >> Goetz. >> >> >> >> >> >> >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Donnerstag, 28. November 2013 00:34 >> To: Lindenmaier, Goetz >> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> TL;DR version: >> >> Discussion on the c-i list has now confirmed that a constructor-barrier >> for volatiles is not required as part of the JMM specification. It *may* >> be required in an implementation that doesn't pre-zero memory to ensure >> you can't see uninitialized fields. So the tests for this are invalid >> and this part of the patch is not needed in general (ppc64 may need it >> due to other factors). >> >> Re: "multiple copy atomicity" - first thanks for correcting the term :) >> Second thanks for the reference to that paper! For reference: >> >> "The memory system (perhaps involving a hierarchy of buffers and a >> complex interconnect) does not guarantee that a write becomes visible to >> all other hardware threads at the same time point; these architectures >> are not multiple-copy atomic." >> >> This is the visibility issue that I referred to and affects both ARM and >> PPC. But of course it is normally handled by using suitable barriers >> after the stores that need to be visible. I think the crux of the >> current issue is what you wrote below: >> >> > The fixes for the constructor issue are only needed because we >> > remove the sync instruction from behind stores (parse3.cpp:320) >> > and place it before loads. >> >> I hadn't grasped this part. Obviously if you fail to do the sync after >> the store then you have to do something around the loads to get the same >> results! I still don't know what lead you to the conclusion that the >> only way to fix the IRIW issue was to put the fence before the load - >> maybe when I get the chance to read that paper in full it will be clearer. >> >> So ... the basic problem is that the current structure in the VM has >> hard-wired one choice of how to get the right semantics for volatile >> variables. You now want to customize that but not all the requisite >> hooks are present. It would be better if volatile_load and >> volatile_store were factored out so that they could be implemented as >> desired per-platform. Alternatively there could be pre- and post- hooks >> that could then be customized per platform. Otherwise you need >> platform-specific ifdef's to handle it as per your patch. >> >> I'm not happy with the ifdef approach but I won't block it. I think this >> is an area where a lot of clean up is needed in the VM. The barrier >> abstractions are a confused mess in my opinion. >> >> Thanks, >> David >> ----- >> >> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>> Hi, >>> >>> I updated the webrev to fix the issues mentioned by Vladimir: >>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>> >>> I did not yet add the >>> OrderAccess::needs_support_iriw_ordering() >>> VM_Version::needs_support_iriw_ordering() >>> or >>> OrderAccess::cpu_is_multiple_copy_atomic() >>> to reduce #defined, as I got no further comment on that. >>> >>> >>> WRT to the validity of the tests and the interpretation of the JMM >>> I feel not in the position to contribute substantially. >>> >>> But we would like to pass the torture test suite as we consider >>> this a substantial task in implementing a PPC port. Also we think >>> both tests show behavior a programmer would expect. It's bad if >>> Java code runs fine on the more common x86 platform, and then >>> fails on ppc. This will always first be blamed on the VM. >>> >>> The fixes for the constructor issue are only needed because we >>> remove the sync instruction from behind stores (parse3.cpp:320) >>> and place it before loads. Then there is no sync between volatile store >>> and publishing the object. So we add it again in this one case >>> (volatile store in constructor). >>> >>> >>> @David >>>>> Sure. There also is no solution as you require for the taskqueue problem yet, >>>>> and that's being discussed now for almost a year. >>>> It may have started a year ago but work on it has hardly been continuous. >>> That's not true, we did a lot of investigation and testing on this issue. >>> And we came up with a solution we consider the best possible. If you >>> have objections, you should at least give the draft of a better solution, >>> we would volunteer to implement and test it. >>> Similarly, we invested time in fixing the concurrency torture issues. >>> >>> @David >>>> What is "multiple-read-atomicity"? I'm not familiar with the term and >>>> can't find any reference to it. >>> We learned about this reading "A Tutorial Introduction to the ARM and >>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>> Peter Sewell, which is cited in "Correct and Efficient Work-Stealing for >>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the taskqueue problem. >>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>> >>> I was wrong in one thing, it's called multiple copy atomicity, I used 'read' >>> instead. Sorry for that. (I also fixed that in the method name above). >>> >>> Best regards and thanks for all your involvements, >>> Goetz. >>> >>> >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Mittwoch, 27. November 2013 12:53 >>> To: Lindenmaier, Goetz >>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>> >>> Hi Goetz, >>> >>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>> Hi David, >>>> >>>> -- Volatile in constuctor >>>>> AFAIK we have not seen those tests fail due to a >>>>> missing constructor barrier. >>>> We see them on PPC64. Our test machines have typically 8-32 processors >>>> and are Power 5-7. But see also Aleksey's mail. (Thanks Aleksey!) >>> >>> And see follow ups - the tests are invalid. >>> >>>> -- IRIW issue >>>>> I can not possibly answer to the necessary level of detail with a few >>>>> moments thought. >>>> Sure. There also is no solution as you require for the taskqueue problem yet, >>>> and that's being discussed now for almost a year. >>> >>> It may have started a year ago but work on it has hardly been continuous. >>> >>>>> You are implying there is a problem here that will >>>>> impact numerous platforms (unless you can tell me why ppc is so different?) >>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore I contributed a >>>> solution with the #defines, and that's correct for all, but not nice, I admit. >>>> (I don't really know about ARM, though). >>>> So if I can write down a nicer solution testing for methods that are evaluated >>>> by the C-compiler I'm happy. >>>> >>>> The problem is not that IRIW is not handled by the JMM, the problem >>>> is that >>>> store >>>> sync >>>> does not assure multiple-read-atomicity, >>>> only >>>> sync >>>> load >>>> does so on PPC. And you require multiple-read-atomicity to >>>> pass that test. >>> >>> What is "multiple-read-atomicity"? I'm not familiar with the term and >>> can't find any reference to it. >>> >>> Thanks, >>> David >>> >>> The JMM is fine. And >>>> store >>>> MemBarVolatile >>>> is fine on x86, sparc etc. as there exist assembler instructions that >>>> do what is required. >>>> >>>> So if you are off soon, please let's come to a solution that >>>> might be improvable in the way it's implemented, but that >>>> allows us to implement a correct PPC64 port. >>>> >>>> Best regards, >>>> Goetz. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>> To: Lindenmaier, Goetz >>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>> >>>> Hi Goetz, >>>> >>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>> Hi everybody, >>>>> >>>>> thanks a lot for the detailed reviews! >>>>> I'll try to answer to all in one mail. >>>>> >>>>>> Volatile fields written in constructor aren't guaranteed by JMM to occur before the reference is assigned; >>>>> We don't think it's correct if we omit the barrier after initializing >>>>> a volatile field. Previously, we discussed this with Aleksey Shipilev >>>>> and Doug Lea, and they agreed. >>>>> Also, concurrency torture tests >>>>> LongVolatileTest >>>>> AtomicIntegerInitialValueTest >>>>> will fail. >>>>> (In addition, observing 0 instead of the inital value of a volatile field would be >>>>> very counter-intuitive for Java programmers, especially in AtomicInteger.) >>>> >>>> The affects of unsafe publication are always surprising - volatiles do >>>> not add anything special here. AFAIK there is nothing in the JMM that >>>> requires the constructor barrier - discussions with Doug and Aleksey >>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>> missing constructor barrier. >>>> >>>>>> proposed for PPC64 is to make volatile reads extremely heavyweight >>>>> Yes, it costs measurable performance. But else it is wrong. We don't >>>>> see a way to implement this cheaper. >>>>> >>>>>> - these algorithms should be expressed using the correct OrderAccess operations >>>>> Basically, I agree on this. But you also have to take into account >>>>> that due to the different memory ordering instructions on different platforms >>>>> just implementing something empty is not sufficient. >>>>> An example: >>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>> MemBarVolatile // means StoreLoad barrier >>>>> If these are consecutively in the code, sparc code looks like this: >>>>> MemBarRelease --> membar(Assembler::LoadStore | Assembler::StoreStore) >>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>> Just doing what is required. >>>>> On Power, we get suboptimal code, as there are no comparable, >>>>> fine grained operations: >>>>> MemBarRelease --> lwsync // Doing LoadStore, StoreStore, LoadLoad >>>>> MemBarVolatile --> sync // // Doing LoadStore, StoreStore, LoadLoad, StoreLoad >>>>> obviously, the lwsync is superfluous. Thus, as PPC operations are more (too) powerful, >>>>> I need an additional optimization that removes the lwsync. I can not implement >>>>> MemBarRelease empty, as it is also used independently. >>>>> >>>>> Back to the IRIW problem. I think here we have a comparable issue. >>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the read >>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>> >>>>> I would propose to guard the code by >>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>> Else, David, how would you propose to implement this platform independent? >>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>> >>>> I can not possibly answer to the necessary level of detail with a few >>>> moments thought. You are implying there is a problem here that will >>>> impact numerous platforms (unless you can tell me why ppc is so >>>> different?) and I can not take that on face value at the moment. The >>>> only reason I can see IRIW not being handled by the JMM requirements for >>>> volatile accesses is if there are global visibility issues that are not >>>> addressed - but even then I would expect heavy barriers at the store >>>> would deal with that, not at the load. (This situation reminds me of the >>>> need for read-barriers on Alpha architecture due to the use of software >>>> cache-coherency rather than hardware cache-coherency - but we don't have >>>> that on ppc!) >>>> >>>> Sorry - There is no quick resolution here and in a couple of days I will >>>> be heading out on vacation for two weeks. >>>> >>>> David >>>> ----- >>>> >>>>> Best regards, >>>>> Goetz. >>>>> >>>>> -- Other ports: >>>>> The IRIW issue requires at least 3 processors to be relevant, so it might >>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>> of PPC64_ONLY if you request so (and if we don't get rid of them). >>>>> >>>>> -- MemBarStoreStore after initialization >>>>> I agree we should not change it in the ppc port. If you wish, I can >>>>> prepare an extra webrev for hotspot-comp. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>> To: Vladimir Kozlov >>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>>> >>>>> Okay this is my second attempt at answering this in a reasonable way :) >>>>> >>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>> I have to ask David to do correctness evaluation. >>>>> >>>>> From what I understand what we see here is an attempt to fix an >>>>> existing issue with the implementation of volatiles so that the IRIW >>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>> volatile reads extremely heavyweight by adding a fence() when doing the >>>>> load. >>>>> >>>>> Now if this was purely handled in ppc64 source code then I would be >>>>> happy to let them do whatever they like (surely this kills performance >>>>> though!). But I do not agree with the changes to the shared code that >>>>> allow this solution to be implemented - even with PPC64_ONLY this is >>>>> polluting the shared code. My concern is similar to what I said with the >>>>> taskQueue changes - these algorithms should be expressed using the >>>>> correct OrderAccess operations to guarantee the desired properties >>>>> independent of architecture. If such a "barrier" is not needed on a >>>>> given architecture then the implementation in OrderAccess should reduce >>>>> to a no-op. >>>>> >>>>> And as Vitaly points out the constructor barriers are not needed under >>>>> the JMM. >>>>> >>>>>> I am fine with suggested changes because you did not change our current >>>>>> code for our platforms (please, do not change do_exits() now). >>>>>> But may be it should be done using more general query which is set >>>>>> depending on platform: >>>>>> >>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>> >>>>>> or similar to what we use now: >>>>>> >>>>>> VM_Version::needs_support_iriw_ordering() >>>>> >>>>> Every platform has to support IRIW this is simply part of the Java >>>>> Memory Model, there should not be any need to call this out explicitly >>>>> like this. >>>>> >>>>> Is there some subtlety of the hardware I am missing here? Are there >>>>> visibility issues beyond the ordering constraints that the JMM defines? >>>>>> From what I understand our ppc port is also affected. David? >>>>> >>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> In library_call.cpp can you add {}? New comment should be inside else {}. >>>>>> >>>>>> I think you should make _wrote_volatile field not ppc64 specific which >>>>>> will be set to 'true' only on ppc64. Then you will not need PPC64_ONLY() >>>>>> except in do_put_xxx() where it is set to true. Too many #ifdefs. >>>>>> >>>>>> In do_put_xxx() can you combine your changes: >>>>>> >>>>>> if (is_vol) { >>>>>> // See comment in do_get_xxx(). >>>>>> #ifndef PPC64 >>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>> #else >>>>>> if (is_field) { >>>>>> // Add MemBarRelease for constructors which write volatile field >>>>>> (PPC64). >>>>>> set_wrote_volatile(true); >>>>>> } >>>>>> #endif >>>>>> } >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I preprared a webrev with fixes for PPC for the VolatileIRIWTest of >>>>>>> the torture test suite: >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>> >>>>>>> Example: >>>>>>> volatile x=0, y=0 >>>>>>> __________ __________ __________ __________ >>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>> >>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>> read(y) read(x) >>>>>>> >>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>> >>>>>>> >>>>>>> Solution: This example requires multiple-copy-atomicity. This is only >>>>>>> assured by the sync instruction and if it is executed in the threads >>>>>>> doing the loads. Thus we implement volatile read as sync-load-acquire >>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>> >>>>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs >>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>> >>>>>>> Further this change contains a fix that assures that volatile fields >>>>>>> written in constructors are visible before the reference gets >>>>>>> published. >>>>>>> >>>>>>> >>>>>>> Looking at the code, we found a MemBarRelease that to us, seems too >>>>>>> strong. >>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should suffice. >>>>>>> What do you think? >>>>>>> >>>>>>> Please review and test this change. >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz. >>>>>>> From volker.simonis at gmail.com Wed Jan 15 08:34:58 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 15 Jan 2014 17:34:58 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: <94ABE61E-10BC-407E-92D0-B528165F3460@oracle.com> References: <94ABE61E-10BC-407E-92D0-B528165F3460@oracle.com> Message-ID: Hi Staffan, thanks for the review. Please find my comments inline: On Wed, Jan 15, 2014 at 9:57 AM, Staffan Larsen wrote: > Volker, > > I?ve look at the following files: > > src/share/native/sun/management/DiagnosticCommandImpl.c: > nit: ?legel? -> ?legal? (two times) > In Java_sun_management_DiagnosticCommandImpl_getDiagnosticCommandInfo() if > you allow dcmd_info_array to become NULL, > then jmm_interface->GetDiagnosticCommandInfo() will throw an NPE and you > need to check that. > Good catch. I actually had problems with malloc returning NULL in 'getDiagnosticCommandArgumentInfoArray()' and then changed all other potentially dangerous locations which used the same pattern. However I think if the 'dcmd_info_array' has zero length it would be perfectly fine to return a zero length array. So what about the following solution: dcmdInfoCls = (*env)->FindClass(env, "sun/management/DiagnosticCommandInfo"); num_commands = (*env)->GetArrayLength(env, commands); if (num_commands = 0) { result = (*env)->NewObjectArray(env, 0, dcmdInfoCls, NULL); if (result == NULL) { JNU_ThrowOutOfMemoryError(env, 0); } else { return result; } } dcmd_info_array = (dcmdInfo*) malloc(num_commands * sizeof(dcmdInfo)); if (dcmd_info_array == NULL) { JNU_ThrowOutOfMemoryError(env, NULL); } jmm_interface->GetDiagnosticCommandInfo(env, commands, dcmd_info_array); result = (*env)->NewObjectArray(env, num_commands, dcmdInfoCls, NULL); That seems easier and saves me from handling the exception. What do you think? src/solaris/native/sun/management/OperatingSystemImpl.c > No comments. > > src/share/transport/socket/socketTransport.c > No comments. > > > src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider > No comments. > > > Thanks, > /Staffan > > > > On 14 jan 2014, at 09:40, Volker Simonis wrote: > > Hi, > > could you please review the following changes for the ppc-aix-port > stage/stage-9 repositories (the changes are planned for integration into > ppc-aix-port/stage-9 and subsequent backporting to ppc-aix-port/stage): > > http://cr.openjdk.java.net/~simonis/webrevs/8031581/ > > I've build and smoke tested without any problems on Linux/x86_64 and > PPC64, Windows/x86_64, MacOSX, Solaris/SPARC64 and AIX7PPC64. > > With these changes (and together with the changes from "8028537: PPC64: > Updated jdk/test scripts to understand the AIX os and environment" and > "8031134 : PPC64: implement printing on AIX") our port passes all but the > following 7 jtreg regression tests on AIX (compared to the Linux/x86_64 > baseline from www.java.net/download/jdk8/testresults/testresults.html?): > > java/net/Inet6Address/B6558853.java > java/nio/channels/AsynchronousChannelGroup/Basic.java (sporadically) > java/nio/channels/AsynchronousChannelGroup/GroupOfOne.java > java/nio/channels/AsynchronousChannelGroup/Unbounded.java (sporadically) > java/nio/channels/Selector/RacyDeregister.java > sun/security/krb5/auto/Unreachable.java (only on IPv6) > > Thank you and best regards, > Volker > > > Following a detailed description of the various changes: > src/share/native/java/util/zip/zip_util.c > src/share/native/sun/management/DiagnosticCommandImpl.c > > - According to ISO C it is perfectly legal for malloc to return zero > if called with a zero argument. Fix various places where malloc can > potentially correctly return zero because it was called with a zero > argument. > - Also fixed DiagnosticCommandImpl.c to include stdlib.h. This only > fixes a compiler warning on Linux, but on AIX it prevents a VM crash later > on because the return value of malloc() will be casted to int which is > especially bad if that pointer was bigger than 32-bit. > > make/CompileJavaClasses.gmk > > - Also use PollingWatchService on AIX. > > make/lib/NioLibraries.gmk > src/aix/native/sun/nio/ch/AixNativeThread.c > > - Put the implementation for the native methods of NativeThread into > AixNativeThread.c on AIX. > > src/solaris/native/sun/nio/ch/PollArrayWrapper.c > src/solaris/native/sun/nio/ch/Net.c > src/aix/classes/sun/nio/ch/AixPollPort.java > src/aix/native/sun/nio/ch/AixPollPort.c > src/aix/native/java/net/aix_close.c > > - On AIX, the constants used for the polling events (i.e. POLLIN, > POLLOUT, ...) are defined to different values than on other operating > systems. The problem is however, that these constants are hardcoded as > public final static members of various, shared Java classes. We therefore > have to map them from Java to native every time before calling one of the > native poll functions and back to Java after the call on AIX in order to > get the right semantics. > > src/share/classes/java/nio/file/CopyMoveHelper.java > > - As discussed on the core-libs mailing list (see > http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-December/024119.html) > it is not necessary to call Files.getFileAttributeView() with any > linkOptions because at that place we've already checked that the > target file can not be a symbolic link. This change makes the > implementation more robust on platforms which support symbolic links but do > not support the O_NOFOLLOW flag to the open system call. It also makes > the JDK pass the demo/zipfs/basic.sh test on AIX. > > src/share/classes/sun/nio/cs/ext/ExtendedCharsets.java > > - Support "compound text" on AIX in the same way like on other Unix > platforms. > > > src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider > > - Define the correct attach provider for AIX. > > src/solaris/native/java/net/net_util_md.h > src/solaris/native/sun/nio/ch/FileDispatcherImpl.c > src/solaris/native/sun/nio/ch/ServerSocketChannelImpl.c > > - AIX needs a workaround for I/O cancellation (see: > http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.basetechref/doc/basetrf1/close.htm). > "..The close() subroutine is blocked until all subroutines which use > the file descriptor return to usr space. For example, when a thread is > calling close and another thread is calling select with the same file > descriptor, the close subroutine does not return until the select call > returns...". To fix this problem, we have to use the various NET_wrappers which are declared in > net_util_md.h and defined in aix_close.c and we also need some > additional wrappers for fcntl(), read() and write() on AIX. > While the current solution isn't really nice because it introduces > some more AIX-specifc sections in shared code, I think it is the best way > to go for JDK 8 because it imposes the smallest possible changes and risks > for the existing platforms. I'm ready to change the code to unconditionally > use the wrappers for all platforms and implement the wrappers empty on > platforms which don't need any wrapping. I think it would also be nice to > clean up the names (e.g. NET_Read() is currently a wrapper for recv()and the > NET_ prefix is probably not appropriate any more so maybe change it to > something like IO_). But again, I'll prefer to keep that as a follow > up change for JDK9. > - Calling fsync() on a "read-only" file descriptor on AIX will result > in an error (i.e. "EBADF: The FileDescriptor parameter is not a valid file > descriptor open for writing."). To prevent this error we have to query if > the corresponding file descriptor is writeable. Notice that at that point > we can not access the writable attribute of the corresponding file > channel so we have to use fcntl(). > > src/solaris/classes/java/lang/UNIXProcess.java.aix > > - On AIX the implementation is especially tricky, because the close()system call will block if another thread is at the same time blocked in a > file operation (e.g. 'read()') on the same file descriptor. We therefore > combine the AIX ProcessPipeInputStream implemenatation with the > DeferredCloseInputStream approach used on Solaris (see > UNIXProcess.java.solaris). This means that every potentially blocking > operation on the file descriptor increments a counter before it is executed > and decrements it once it finishes. The 'close()' operation will only be > executed if there are no pending operations. Otherwise it is deferred after > the last pending operation has finished. > > src/share/transport/socket/socketTransport.c > > - On AIX we have to call shutdown() on a socket descriptor before > closing it, otherwise the close() call may be blocked. This is the > same problem as described before. Unfortunately the JDI framework doesn't > use the same IO wrappers like other class library components so we can not > easily use the NET_ abstractions from aix_close.c here. > - Without this small change all JDI regression tests will fail on AIX > because of the way how the tests act as a "debugger" which launches another > VM (the "debugge") which connects itself back to the debugger. In this > scenario the "debugge" can not shut down itself because one thread will > always be blocked in the close() call on one of the communication > sockets. > > src/solaris/native/java/net/NetworkInterface.c > > - Set the scope identifier for IPv6 addresses on AIX. > > src/solaris/native/java/net/net_util_md.c > > - It turns out that we do not always have to replace SO_REUSEADDR on > AIX by SO_REUSEPORT. Instead we can simply use the same approach like > BSD and only use SO_REUSEPORT additionally, if several datagram > sockets try to bind to the same port. > - Also fixed a comment and removed unused local variables. > - Fixed the obviously inverted assignment newTime = prevTime; which > should read prevTime = newTime;. Otherwise prevTime will never change > and the timeout will be potential reached too fast. > > src/solaris/native/sun/management/OperatingSystemImpl.c > > - AIX does not understand /proc/self so we have to query the real > process ID to access the proc file system. > > src/solaris/native/sun/nio/ch/DatagramChannelImpl.c > > - On AIX, connect() may legally return EAFNOSUPPORT if called on a > socket with the address family set to AF_UNSPEC. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140115/6f1cd9ba/attachment-0001.html From volker.simonis at gmail.com Wed Jan 15 08:42:47 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 15 Jan 2014 17:42:47 +0100 Subject: RFR(S/L): 8028537: PPC64: Updated the JDK regression tests to run on AIX In-Reply-To: <52D58C55.6070004@oracle.com> References: <52D58C55.6070004@oracle.com> Message-ID: Hi Alan, thanks for the suggestion. That's fine for me. I've copied the empty SCTP stubs from the macosx to the aix directory as well and updated the make file accordingly (in the patch for "8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests"). Therefore, the changes to the three tests: test/com/sun/nio/sctp/SctpChannel/Util.java test/com/sun/nio/sctp/SctpMultiChannel/Util.java test/com/sun/nio/sctp/SctpServerChannel/Util.java can be considered obsolete. Regards, Volker On Tue, Jan 14, 2014 at 8:13 PM, Alan Bateman wrote: > On 14/01/2014 16:57, Volker Simonis wrote: > >> : >> >> test/com/sun/nio/sctp/SctpChannel/Util.java >> test/com/sun/nio/sctp/SctpMultiChannel/Util.java >> test/com/sun/nio/sctp/SctpServerChannel/Util.java >> >> - On AIX, we currently haven't implemented SCTP but we nevertheless >> >> compile the shared SCTP classes into the runtime class library. This >> way >> the AIX JDK can at least compile SCTP applications altough it can not >> run >> them. To support this scenario, the runtime check for the >> availability of >> SCTP has to be extended to catch UnsatisfiedLinkError and >> NoClassDefFoundError. UnsatisfiedLinkError will be thrown the first >> time >> when the class SctpChannelImpl will be loaded because it cannot load >> the >> its native support library in the static initialisation section. On >> the >> next load attempt of the class, a NoClassDefFoundError will be thrown >> because of the previously failed initialisation. >> >> OS X has the same issue and the solution used there are stub > implementations that just throw UOE. Details in jdk/src/macosx/classes/sun/nio/ch/sctp > and that maybe that would work for AIX too. > > -Alan. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140115/332896b9/attachment.html From volker.simonis at gmail.com Wed Jan 15 09:27:01 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 15 Jan 2014 18:27:01 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: <94ABE61E-10BC-407E-92D0-B528165F3460@oracle.com> Message-ID: On Wed, Jan 15, 2014 at 5:34 PM, Volker Simonis wrote: > Hi Staffan, > > thanks for the review. Please find my comments inline: > > On Wed, Jan 15, 2014 at 9:57 AM, Staffan Larsen > wrote: >> >> Volker, >> >> I?ve look at the following files: >> >> src/share/native/sun/management/DiagnosticCommandImpl.c: >> nit: ?legel? -> ?legal? (two times) >> In Java_sun_management_DiagnosticCommandImpl_getDiagnosticCommandInfo() if >> you allow dcmd_info_array to become NULL, then >> jmm_interface->GetDiagnosticCommandInfo() will throw an NPE and you need to >> check that. > > > Good catch. I actually had problems with malloc returning NULL in > 'getDiagnosticCommandArgumentInfoArray()' and then changed all other > potentially dangerous locations which used the same pattern. > > However I think if the 'dcmd_info_array' has zero length it would be > perfectly fine to return a zero length array. So what about the following > solution: > > dcmdInfoCls = (*env)->FindClass(env, > "sun/management/DiagnosticCommandInfo"); > num_commands = (*env)->GetArrayLength(env, commands); Sorry, of course I wanted to say "if (num_commands == 0)" here! > if (num_commands = 0) { > result = (*env)->NewObjectArray(env, 0, dcmdInfoCls, NULL); > if (result == NULL) { > JNU_ThrowOutOfMemoryError(env, 0); > } > else { > return result; > } > } > dcmd_info_array = (dcmdInfo*) malloc(num_commands * sizeof(dcmdInfo)); > if (dcmd_info_array == NULL) { > JNU_ThrowOutOfMemoryError(env, NULL); > } > jmm_interface->GetDiagnosticCommandInfo(env, commands, dcmd_info_array); > result = (*env)->NewObjectArray(env, num_commands, dcmdInfoCls, NULL); > > That seems easier and saves me from handling the exception. > > What do you think? > >> src/solaris/native/sun/management/OperatingSystemImpl.c >> No comments. >> >> src/share/transport/socket/socketTransport.c >> No comments. >> >> >> src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider >> No comments. >> >> >> Thanks, >> /Staffan >> >> >> >> On 14 jan 2014, at 09:40, Volker Simonis wrote: >> >> Hi, >> >> could you please review the following changes for the ppc-aix-port >> stage/stage-9 repositories (the changes are planned for integration into >> ppc-aix-port/stage-9 and subsequent backporting to ppc-aix-port/stage): >> >> http://cr.openjdk.java.net/~simonis/webrevs/8031581/ >> >> I've build and smoke tested without any problems on Linux/x86_64 and >> PPC64, Windows/x86_64, MacOSX, Solaris/SPARC64 and AIX7PPC64. >> >> With these changes (and together with the changes from "8028537: PPC64: >> Updated jdk/test scripts to understand the AIX os and environment" and >> "8031134 : PPC64: implement printing on AIX") our port passes all but the >> following 7 jtreg regression tests on AIX (compared to the Linux/x86_64 >> baseline from www.java.net/download/jdk8/testresults/testresults.html?): >> >> java/net/Inet6Address/B6558853.java >> java/nio/channels/AsynchronousChannelGroup/Basic.java (sporadically) >> java/nio/channels/AsynchronousChannelGroup/GroupOfOne.java >> java/nio/channels/AsynchronousChannelGroup/Unbounded.java (sporadically) >> java/nio/channels/Selector/RacyDeregister.java >> sun/security/krb5/auto/Unreachable.java (only on IPv6) >> >> Thank you and best regards, >> Volker >> >> >> Following a detailed description of the various changes: >> >> src/share/native/java/util/zip/zip_util.c >> src/share/native/sun/management/DiagnosticCommandImpl.c >> >> According to ISO C it is perfectly legal for malloc to return zero if >> called with a zero argument. Fix various places where malloc can potentially >> correctly return zero because it was called with a zero argument. >> Also fixed DiagnosticCommandImpl.c to include stdlib.h. This only fixes a >> compiler warning on Linux, but on AIX it prevents a VM crash later on >> because the return value of malloc() will be casted to int which is >> especially bad if that pointer was bigger than 32-bit. >> >> make/CompileJavaClasses.gmk >> >> Also use PollingWatchService on AIX. >> >> make/lib/NioLibraries.gmk >> src/aix/native/sun/nio/ch/AixNativeThread.c >> >> Put the implementation for the native methods of NativeThread into >> AixNativeThread.c on AIX. >> >> src/solaris/native/sun/nio/ch/PollArrayWrapper.c >> src/solaris/native/sun/nio/ch/Net.c >> src/aix/classes/sun/nio/ch/AixPollPort.java >> src/aix/native/sun/nio/ch/AixPollPort.c >> src/aix/native/java/net/aix_close.c >> >> On AIX, the constants used for the polling events (i.e. POLLIN, POLLOUT, >> ...) are defined to different values than on other operating systems. The >> problem is however, that these constants are hardcoded as public final >> static members of various, shared Java classes. We therefore have to map >> them from Java to native every time before calling one of the native poll >> functions and back to Java after the call on AIX in order to get the right >> semantics. >> >> src/share/classes/java/nio/file/CopyMoveHelper.java >> >> As discussed on the core-libs mailing list (see >> http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-December/024119.html) >> it is not necessary to call Files.getFileAttributeView() with any >> linkOptions because at that place we've already checked that the target file >> can not be a symbolic link. This change makes the implementation more robust >> on platforms which support symbolic links but do not support the O_NOFOLLOW >> flag to the open system call. It also makes the JDK pass the >> demo/zipfs/basic.sh test on AIX. >> >> src/share/classes/sun/nio/cs/ext/ExtendedCharsets.java >> >> Support "compound text" on AIX in the same way like on other Unix >> platforms. >> >> >> src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider >> >> Define the correct attach provider for AIX. >> >> src/solaris/native/java/net/net_util_md.h >> src/solaris/native/sun/nio/ch/FileDispatcherImpl.c >> src/solaris/native/sun/nio/ch/ServerSocketChannelImpl.c >> >> AIX needs a workaround for I/O cancellation (see: >> http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.basetechref/doc/basetrf1/close.htm). >> "..The close() subroutine is blocked until all subroutines which use the >> file descriptor return to usr space. For example, when a thread is calling >> close and another thread is calling select with the same file descriptor, >> the close subroutine does not return until the select call returns...". To >> fix this problem, we have to use the various NET_ wrappers which are >> declared in net_util_md.h and defined in aix_close.c and we also need some >> additional wrappers for fcntl(), read() and write() on AIX. >> While the current solution isn't really nice because it introduces some >> more AIX-specifc sections in shared code, I think it is the best way to go >> for JDK 8 because it imposes the smallest possible changes and risks for the >> existing platforms. I'm ready to change the code to unconditionally use the >> wrappers for all platforms and implement the wrappers empty on platforms >> which don't need any wrapping. I think it would also be nice to clean up the >> names (e.g. NET_Read() is currently a wrapper for recv() and the NET_ prefix >> is probably not appropriate any more so maybe change it to something like >> IO_). But again, I'll prefer to keep that as a follow up change for JDK9. >> Calling fsync() on a "read-only" file descriptor on AIX will result in an >> error (i.e. "EBADF: The FileDescriptor parameter is not a valid file >> descriptor open for writing."). To prevent this error we have to query if >> the corresponding file descriptor is writeable. Notice that at that point we >> can not access the writable attribute of the corresponding file channel so >> we have to use fcntl(). >> >> src/solaris/classes/java/lang/UNIXProcess.java.aix >> >> On AIX the implementation is especially tricky, because the close() system >> call will block if another thread is at the same time blocked in a file >> operation (e.g. 'read()') on the same file descriptor. We therefore combine >> the AIX ProcessPipeInputStream implemenatation with the >> DeferredCloseInputStream approach used on Solaris (see >> UNIXProcess.java.solaris). This means that every potentially blocking >> operation on the file descriptor increments a counter before it is executed >> and decrements it once it finishes. The 'close()' operation will only be >> executed if there are no pending operations. Otherwise it is deferred after >> the last pending operation has finished. >> >> src/share/transport/socket/socketTransport.c >> >> On AIX we have to call shutdown() on a socket descriptor before closing >> it, otherwise the close() call may be blocked. This is the same problem as >> described before. Unfortunately the JDI framework doesn't use the same IO >> wrappers like other class library components so we can not easily use the >> NET_ abstractions from aix_close.c here. >> Without this small change all JDI regression tests will fail on AIX >> because of the way how the tests act as a "debugger" which launches another >> VM (the "debugge") which connects itself back to the debugger. In this >> scenario the "debugge" can not shut down itself because one thread will >> always be blocked in the close() call on one of the communication sockets. >> >> src/solaris/native/java/net/NetworkInterface.c >> >> Set the scope identifier for IPv6 addresses on AIX. >> >> src/solaris/native/java/net/net_util_md.c >> >> It turns out that we do not always have to replace SO_REUSEADDR on AIX by >> SO_REUSEPORT. Instead we can simply use the same approach like BSD and only >> use SO_REUSEPORT additionally, if several datagram sockets try to bind to >> the same port. >> Also fixed a comment and removed unused local variables. >> Fixed the obviously inverted assignment newTime = prevTime; which should >> read prevTime = newTime;. Otherwise prevTime will never change and the >> timeout will be potential reached too fast. >> >> src/solaris/native/sun/management/OperatingSystemImpl.c >> >> AIX does not understand /proc/self so we have to query the real process ID >> to access the proc file system. >> >> src/solaris/native/sun/nio/ch/DatagramChannelImpl.c >> >> On AIX, connect() may legally return EAFNOSUPPORT if called on a socket >> with the address family set to AF_UNSPEC. >> >> >> > From volker.simonis at gmail.com Wed Jan 15 09:52:39 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 15 Jan 2014 18:52:39 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: <94ABE61E-10BC-407E-92D0-B528165F3460@oracle.com> Message-ID: On Wed, Jan 15, 2014 at 6:27 PM, Volker Simonis wrote: > On Wed, Jan 15, 2014 at 5:34 PM, Volker Simonis > wrote: >> Hi Staffan, >> >> thanks for the review. Please find my comments inline: >> >> On Wed, Jan 15, 2014 at 9:57 AM, Staffan Larsen < staffan.larsen at oracle.com> >> wrote: >>> >>> Volker, >>> >>> I?ve look at the following files: >>> >>> src/share/native/sun/management/DiagnosticCommandImpl.c: >>> nit: ?legel? -> ?legal? (two times) >>> In Java_sun_management_DiagnosticCommandImpl_getDiagnosticCommandInfo() if >>> you allow dcmd_info_array to become NULL, then >>> jmm_interface->GetDiagnosticCommandInfo() will throw an NPE and you need to >>> check that. >> >> >> Good catch. I actually had problems with malloc returning NULL in >> 'getDiagnosticCommandArgumentInfoArray()' and then changed all other >> potentially dangerous locations which used the same pattern. >> >> However I think if the 'dcmd_info_array' has zero length it would be >> perfectly fine to return a zero length array. So what about the following >> solution: >> Sorry for the noise - it seems I was a little indisposed during the last mails:) So this is the simple change I'd like to propose for Java_sun_management_DiagnosticCommandImpl_getDiagnosticCommandInfo(): @@ -117,19 +119,23 @@ return NULL; } num_commands = (*env)->GetArrayLength(env, commands); - dcmd_info_array = (dcmdInfo*) malloc(num_commands * - sizeof(dcmdInfo)); + dcmdInfoCls = (*env)->FindClass(env, + "sun/management/DiagnosticCommandInfo"); + result = (*env)->NewObjectArray(env, num_commands, dcmdInfoCls, NULL); + if (result == NULL) { + JNU_ThrowOutOfMemoryError(env, 0); + } + if (num_commands == 0) { + /* Handle the 'zero commands' case specially to avoid calling 'malloc()' */ + /* with a zero argument because that may legally return a NULL pointer. */ + return result; + } + dcmd_info_array = (dcmdInfo*) malloc(num_commands * sizeof(dcmdInfo)); if (dcmd_info_array == NULL) { JNU_ThrowOutOfMemoryError(env, NULL); } jmm_interface->GetDiagnosticCommandInfo(env, commands, dcmd_info_array); - dcmdInfoCls = (*env)->FindClass(env, - "sun/management/DiagnosticCommandInfo"); - result = (*env)->NewObjectArray(env, num_commands, dcmdInfoCls, NULL); - if (result == NULL) { - free(dcmd_info_array); - JNU_ThrowOutOfMemoryError(env, 0); - } for (i=0; iGetObjectArrayElement(env,commands,i), If the 'commands' input array is of zero length just return a zero length array. OK? >> dcmdInfoCls = (*env)->FindClass(env, >> "sun/management/DiagnosticCommandInfo"); >> num_commands = (*env)->GetArrayLength(env, commands); > > Sorry, of course I wanted to say "if (num_commands == 0)" here! > >> if (num_commands = 0) { >> result = (*env)->NewObjectArray(env, 0, dcmdInfoCls, NULL); >> if (result == NULL) { >> JNU_ThrowOutOfMemoryError(env, 0); >> } >> else { >> return result; >> } >> } >> dcmd_info_array = (dcmdInfo*) malloc(num_commands * sizeof(dcmdInfo)); >> if (dcmd_info_array == NULL) { >> JNU_ThrowOutOfMemoryError(env, NULL); >> } >> jmm_interface->GetDiagnosticCommandInfo(env, commands, dcmd_info_array); >> result = (*env)->NewObjectArray(env, num_commands, dcmdInfoCls, NULL); >> >> That seems easier and saves me from handling the exception. >> >> What do you think? >> >>> src/solaris/native/sun/management/OperatingSystemImpl.c >>> No comments. >>> >>> src/share/transport/socket/socketTransport.c >>> No comments. >>> >>> >>> src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider >>> No comments. >>> >>> >>> Thanks, >>> /Staffan >>> >>> >>> >>> On 14 jan 2014, at 09:40, Volker Simonis wrote: >>> >>> Hi, >>> >>> could you please review the following changes for the ppc-aix-port >>> stage/stage-9 repositories (the changes are planned for integration into >>> ppc-aix-port/stage-9 and subsequent backporting to ppc-aix-port/stage): >>> >>> http://cr.openjdk.java.net/~simonis/webrevs/8031581/ >>> >>> I've build and smoke tested without any problems on Linux/x86_64 and >>> PPC64, Windows/x86_64, MacOSX, Solaris/SPARC64 and AIX7PPC64. >>> >>> With these changes (and together with the changes from "8028537: PPC64: >>> Updated jdk/test scripts to understand the AIX os and environment" and >>> "8031134 : PPC64: implement printing on AIX") our port passes all but the >>> following 7 jtreg regression tests on AIX (compared to the Linux/x86_64 >>> baseline from www.java.net/download/jdk8/testresults/testresults.html?): >>> >>> java/net/Inet6Address/B6558853.java >>> java/nio/channels/AsynchronousChannelGroup/Basic.java (sporadically) >>> java/nio/channels/AsynchronousChannelGroup/GroupOfOne.java >>> java/nio/channels/AsynchronousChannelGroup/Unbounded.java (sporadically) >>> java/nio/channels/Selector/RacyDeregister.java >>> sun/security/krb5/auto/Unreachable.java (only on IPv6) >>> >>> Thank you and best regards, >>> Volker >>> >>> >>> Following a detailed description of the various changes: >>> >>> src/share/native/java/util/zip/zip_util.c >>> src/share/native/sun/management/DiagnosticCommandImpl.c >>> >>> According to ISO C it is perfectly legal for malloc to return zero if >>> called with a zero argument. Fix various places where malloc can potentially >>> correctly return zero because it was called with a zero argument. >>> Also fixed DiagnosticCommandImpl.c to include stdlib.h. This only fixes a >>> compiler warning on Linux, but on AIX it prevents a VM crash later on >>> because the return value of malloc() will be casted to int which is >>> especially bad if that pointer was bigger than 32-bit. >>> >>> make/CompileJavaClasses.gmk >>> >>> Also use PollingWatchService on AIX. >>> >>> make/lib/NioLibraries.gmk >>> src/aix/native/sun/nio/ch/AixNativeThread.c >>> >>> Put the implementation for the native methods of NativeThread into >>> AixNativeThread.c on AIX. >>> >>> src/solaris/native/sun/nio/ch/PollArrayWrapper.c >>> src/solaris/native/sun/nio/ch/Net.c >>> src/aix/classes/sun/nio/ch/AixPollPort.java >>> src/aix/native/sun/nio/ch/AixPollPort.c >>> src/aix/native/java/net/aix_close.c >>> >>> On AIX, the constants used for the polling events (i.e. POLLIN, POLLOUT, >>> ...) are defined to different values than on other operating systems. The >>> problem is however, that these constants are hardcoded as public final >>> static members of various, shared Java classes. We therefore have to map >>> them from Java to native every time before calling one of the native poll >>> functions and back to Java after the call on AIX in order to get the right >>> semantics. >>> >>> src/share/classes/java/nio/file/CopyMoveHelper.java >>> >>> As discussed on the core-libs mailing list (see >>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-December/024119.html ) >>> it is not necessary to call Files.getFileAttributeView() with any >>> linkOptions because at that place we've already checked that the target file >>> can not be a symbolic link. This change makes the implementation more robust >>> on platforms which support symbolic links but do not support the O_NOFOLLOW >>> flag to the open system call. It also makes the JDK pass the >>> demo/zipfs/basic.sh test on AIX. >>> >>> src/share/classes/sun/nio/cs/ext/ExtendedCharsets.java >>> >>> Support "compound text" on AIX in the same way like on other Unix >>> platforms. >>> >>> >>> src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider >>> >>> Define the correct attach provider for AIX. >>> >>> src/solaris/native/java/net/net_util_md.h >>> src/solaris/native/sun/nio/ch/FileDispatcherImpl.c >>> src/solaris/native/sun/nio/ch/ServerSocketChannelImpl.c >>> >>> AIX needs a workaround for I/O cancellation (see: >>> http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.basetechref/doc/basetrf1/close.htm ). >>> "..The close() subroutine is blocked until all subroutines which use the >>> file descriptor return to usr space. For example, when a thread is calling >>> close and another thread is calling select with the same file descriptor, >>> the close subroutine does not return until the select call returns...". To >>> fix this problem, we have to use the various NET_ wrappers which are >>> declared in net_util_md.h and defined in aix_close.c and we also need some >>> additional wrappers for fcntl(), read() and write() on AIX. >>> While the current solution isn't really nice because it introduces some >>> more AIX-specifc sections in shared code, I think it is the best way to go >>> for JDK 8 because it imposes the smallest possible changes and risks for the >>> existing platforms. I'm ready to change the code to unconditionally use the >>> wrappers for all platforms and implement the wrappers empty on platforms >>> which don't need any wrapping. I think it would also be nice to clean up the >>> names (e.g. NET_Read() is currently a wrapper for recv() and the NET_ prefix >>> is probably not appropriate any more so maybe change it to something like >>> IO_). But again, I'll prefer to keep that as a follow up change for JDK9. >>> Calling fsync() on a "read-only" file descriptor on AIX will result in an >>> error (i.e. "EBADF: The FileDescriptor parameter is not a valid file >>> descriptor open for writing."). To prevent this error we have to query if >>> the corresponding file descriptor is writeable. Notice that at that point we >>> can not access the writable attribute of the corresponding file channel so >>> we have to use fcntl(). >>> >>> src/solaris/classes/java/lang/UNIXProcess.java.aix >>> >>> On AIX the implementation is especially tricky, because the close() system >>> call will block if another thread is at the same time blocked in a file >>> operation (e.g. 'read()') on the same file descriptor. We therefore combine >>> the AIX ProcessPipeInputStream implemenatation with the >>> DeferredCloseInputStream approach used on Solaris (see >>> UNIXProcess.java.solaris). This means that every potentially blocking >>> operation on the file descriptor increments a counter before it is executed >>> and decrements it once it finishes. The 'close()' operation will only be >>> executed if there are no pending operations. Otherwise it is deferred after >>> the last pending operation has finished. >>> >>> src/share/transport/socket/socketTransport.c >>> >>> On AIX we have to call shutdown() on a socket descriptor before closing >>> it, otherwise the close() call may be blocked. This is the same problem as >>> described before. Unfortunately the JDI framework doesn't use the same IO >>> wrappers like other class library components so we can not easily use the >>> NET_ abstractions from aix_close.c here. >>> Without this small change all JDI regression tests will fail on AIX >>> because of the way how the tests act as a "debugger" which launches another >>> VM (the "debugge") which connects itself back to the debugger. In this >>> scenario the "debugge" can not shut down itself because one thread will >>> always be blocked in the close() call on one of the communication sockets. >>> >>> src/solaris/native/java/net/NetworkInterface.c >>> >>> Set the scope identifier for IPv6 addresses on AIX. >>> >>> src/solaris/native/java/net/net_util_md.c >>> >>> It turns out that we do not always have to replace SO_REUSEADDR on AIX by >>> SO_REUSEPORT. Instead we can simply use the same approach like BSD and only >>> use SO_REUSEPORT additionally, if several datagram sockets try to bind to >>> the same port. >>> Also fixed a comment and removed unused local variables. >>> Fixed the obviously inverted assignment newTime = prevTime; which should >>> read prevTime = newTime;. Otherwise prevTime will never change and the >>> timeout will be potential reached too fast. >>> >>> src/solaris/native/sun/management/OperatingSystemImpl.c >>> >>> AIX does not understand /proc/self so we have to query the real process ID >>> to access the proc file system. >>> >>> src/solaris/native/sun/nio/ch/DatagramChannelImpl.c >>> >>> On AIX, connect() may legally return EAFNOSUPPORT if called on a socket >>> with the address family set to AF_UNSPEC. >>> >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140115/6fe475ef/attachment-0001.html From staffan.larsen at oracle.com Wed Jan 15 11:02:26 2014 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Wed, 15 Jan 2014 20:02:26 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: <94ABE61E-10BC-407E-92D0-B528165F3460@oracle.com> Message-ID: Yes, that looks like a good solution. /Staffan On 15 jan 2014, at 17:34, Volker Simonis wrote: > Hi Staffan, > > thanks for the review. Please find my comments inline: > > On Wed, Jan 15, 2014 at 9:57 AM, Staffan Larsen wrote: > Volker, > > I?ve look at the following files: > > src/share/native/sun/management/DiagnosticCommandImpl.c: > nit: ?legel? -> ?legal? (two times) > In Java_sun_management_DiagnosticCommandImpl_getDiagnosticCommandInfo() if you allow dcmd_info_array to become NULL, then jmm_interface->GetDiagnosticCommandInfo() will throw an NPE and you need to check that. > > Good catch. I actually had problems with malloc returning NULL in 'getDiagnosticCommandArgumentInfoArray()' and then changed all other potentially dangerous locations which used the same pattern. > > However I think if the 'dcmd_info_array' has zero length it would be perfectly fine to return a zero length array. So what about the following solution: > > dcmdInfoCls = (*env)->FindClass(env, > "sun/management/DiagnosticCommandInfo"); > num_commands = (*env)->GetArrayLength(env, commands); > if (num_commands = 0) { > result = (*env)->NewObjectArray(env, 0, dcmdInfoCls, NULL); > if (result == NULL) { > JNU_ThrowOutOfMemoryError(env, 0); > } > else { > return result; > } > } > dcmd_info_array = (dcmdInfo*) malloc(num_commands * sizeof(dcmdInfo)); > if (dcmd_info_array == NULL) { > JNU_ThrowOutOfMemoryError(env, NULL); > } > jmm_interface->GetDiagnosticCommandInfo(env, commands, dcmd_info_array); > result = (*env)->NewObjectArray(env, num_commands, dcmdInfoCls, NULL); > > That seems easier and saves me from handling the exception. > > What do you think? > > src/solaris/native/sun/management/OperatingSystemImpl.c > No comments. > > src/share/transport/socket/socketTransport.c > No comments. > > src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider > No comments. > > > Thanks, > /Staffan > > > > On 14 jan 2014, at 09:40, Volker Simonis wrote: > >> Hi, >> >> could you please review the following changes for the ppc-aix-port stage/stage-9 repositories (the changes are planned for integration into ppc-aix-port/stage-9 and subsequent backporting to ppc-aix-port/stage): >> >> http://cr.openjdk.java.net/~simonis/webrevs/8031581/ >> >> I've build and smoke tested without any problems on Linux/x86_64 and PPC64, Windows/x86_64, MacOSX, Solaris/SPARC64 and AIX7PPC64. >> >> With these changes (and together with the changes from "8028537: PPC64: Updated jdk/test scripts to understand the AIX os and environment" and "8031134 : PPC64: implement printing on AIX") our port passes all but the following 7 jtreg regression tests on AIX (compared to the Linux/x86_64 baseline from www.java.net/download/jdk8/testresults/testresults.html?): >> >> java/net/Inet6Address/B6558853.java >> java/nio/channels/AsynchronousChannelGroup/Basic.java (sporadically) >> java/nio/channels/AsynchronousChannelGroup/GroupOfOne.java >> java/nio/channels/AsynchronousChannelGroup/Unbounded.java (sporadically) >> java/nio/channels/Selector/RacyDeregister.java >> sun/security/krb5/auto/Unreachable.java (only on IPv6) >> >> Thank you and best regards, >> Volker >> >> >> Following a detailed description of the various changes: >> src/share/native/java/util/zip/zip_util.c >> src/share/native/sun/management/DiagnosticCommandImpl.c >> >> According to ISO C it is perfectly legal for malloc to return zero if called with a zero argument. Fix various places where malloc can potentially correctly return zero because it was called with a zero argument. >> Also fixed DiagnosticCommandImpl.c to include stdlib.h. This only fixes a compiler warning on Linux, but on AIX it prevents a VM crash later on because the return value of malloc() will be casted to int which is especially bad if that pointer was bigger than 32-bit. >> make/CompileJavaClasses.gmk >> >> Also use PollingWatchService on AIX. >> make/lib/NioLibraries.gmk >> src/aix/native/sun/nio/ch/AixNativeThread.c >> >> Put the implementation for the native methods of NativeThread into AixNativeThread.c on AIX. >> src/solaris/native/sun/nio/ch/PollArrayWrapper.c >> src/solaris/native/sun/nio/ch/Net.c >> src/aix/classes/sun/nio/ch/AixPollPort.java >> src/aix/native/sun/nio/ch/AixPollPort.c >> src/aix/native/java/net/aix_close.c >> >> On AIX, the constants used for the polling events (i.e. POLLIN, POLLOUT, ...) are defined to different values than on other operating systems. The problem is however, that these constants are hardcoded as public final static members of various, shared Java classes. We therefore have to map them from Java to native every time before calling one of the native poll functions and back to Java after the call on AIX in order to get the right semantics. >> src/share/classes/java/nio/file/CopyMoveHelper.java >> >> As discussed on the core-libs mailing list (see http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-December/024119.html) it is not necessary to call Files.getFileAttributeView() with any linkOptions because at that place we've already checked that the target file can not be a symbolic link. This change makes the implementation more robust on platforms which support symbolic links but do not support the O_NOFOLLOW flag to the open system call. It also makes the JDK pass the demo/zipfs/basic.sh test on AIX. >> src/share/classes/sun/nio/cs/ext/ExtendedCharsets.java >> >> Support "compound text" on AIX in the same way like on other Unix platforms. >> src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider >> >> Define the correct attach provider for AIX. >> src/solaris/native/java/net/net_util_md.h >> src/solaris/native/sun/nio/ch/FileDispatcherImpl.c >> src/solaris/native/sun/nio/ch/ServerSocketChannelImpl.c >> >> AIX needs a workaround for I/O cancellation (see: http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.basetechref/doc/basetrf1/close.htm). "..The close() subroutine is blocked until all subroutines which use the file descriptor return to usr space. For example, when a thread is calling close and another thread is calling select with the same file descriptor, the close subroutine does not return until the select call returns...". To fix this problem, we have to use the various NET_ wrappers which are declared in net_util_md.h and defined in aix_close.c and we also need some additional wrappers for fcntl(), read() and write() on AIX. >> While the current solution isn't really nice because it introduces some more AIX-specifc sections in shared code, I think it is the best way to go for JDK 8 because it imposes the smallest possible changes and risks for the existing platforms. I'm ready to change the code to unconditionally use the wrappers for all platforms and implement the wrappers empty on platforms which don't need any wrapping. I think it would also be nice to clean up the names (e.g. NET_Read() is currently a wrapper for recv() and the NET_ prefix is probably not appropriate any more so maybe change it to something like IO_). But again, I'll prefer to keep that as a follow up change for JDK9. >> Calling fsync() on a "read-only" file descriptor on AIX will result in an error (i.e. "EBADF: The FileDescriptor parameter is not a valid file descriptor open for writing."). To prevent this error we have to query if the corresponding file descriptor is writeable. Notice that at that point we can not access the writable attribute of the corresponding file channel so we have to use fcntl(). >> src/solaris/classes/java/lang/UNIXProcess.java.aix >> >> On AIX the implementation is especially tricky, because the close() system call will block if another thread is at the same time blocked in a file operation (e.g. 'read()') on the same file descriptor. We therefore combine the AIX ProcessPipeInputStream implemenatation with the DeferredCloseInputStream approach used on Solaris (see UNIXProcess.java.solaris). This means that every potentially blocking operation on the file descriptor increments a counter before it is executed and decrements it once it finishes. The 'close()' operation will only be executed if there are no pending operations. Otherwise it is deferred after the last pending operation has finished. >> src/share/transport/socket/socketTransport.c >> >> On AIX we have to call shutdown() on a socket descriptor before closing it, otherwise the close() call may be blocked. This is the same problem as described before. Unfortunately the JDI framework doesn't use the same IO wrappers like other class library components so we can not easily use the NET_ abstractions from aix_close.c here. >> Without this small change all JDI regression tests will fail on AIX because of the way how the tests act as a "debugger" which launches another VM (the "debugge") which connects itself back to the debugger. In this scenario the "debugge" can not shut down itself because one thread will always be blocked in the close() call on one of the communication sockets. >> src/solaris/native/java/net/NetworkInterface.c >> >> Set the scope identifier for IPv6 addresses on AIX. >> src/solaris/native/java/net/net_util_md.c >> >> It turns out that we do not always have to replace SO_REUSEADDR on AIX by SO_REUSEPORT. Instead we can simply use the same approach like BSD and only use SO_REUSEPORT additionally, if several datagram sockets try to bind to the same port. >> Also fixed a comment and removed unused local variables. >> Fixed the obviously inverted assignment newTime = prevTime; which should read prevTime = newTime;. Otherwise prevTime will never change and the timeout will be potential reached too fast. >> src/solaris/native/sun/management/OperatingSystemImpl.c >> >> AIX does not understand /proc/self so we have to query the real process ID to access the proc file system. >> src/solaris/native/sun/nio/ch/DatagramChannelImpl.c >> >> On AIX, connect() may legally return EAFNOSUPPORT if called on a socket with the address family set to AF_UNSPEC. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140115/7a8e80d0/attachment.html From staffan.larsen at oracle.com Wed Jan 15 11:03:01 2014 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Wed, 15 Jan 2014 20:03:01 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: <94ABE61E-10BC-407E-92D0-B528165F3460@oracle.com> Message-ID: <9DE753A4-E6E0-435D-8FB9-5C93CD3184D4@oracle.com> On 15 jan 2014, at 18:27, Volker Simonis wrote: > On Wed, Jan 15, 2014 at 5:34 PM, Volker Simonis > wrote: >> Hi Staffan, >> >> thanks for the review. Please find my comments inline: >> >> On Wed, Jan 15, 2014 at 9:57 AM, Staffan Larsen >> wrote: >>> >>> Volker, >>> >>> I?ve look at the following files: >>> >>> src/share/native/sun/management/DiagnosticCommandImpl.c: >>> nit: ?legel? -> ?legal? (two times) >>> In Java_sun_management_DiagnosticCommandImpl_getDiagnosticCommandInfo() if >>> you allow dcmd_info_array to become NULL, then >>> jmm_interface->GetDiagnosticCommandInfo() will throw an NPE and you need to >>> check that. >> >> >> Good catch. I actually had problems with malloc returning NULL in >> 'getDiagnosticCommandArgumentInfoArray()' and then changed all other >> potentially dangerous locations which used the same pattern. >> >> However I think if the 'dcmd_info_array' has zero length it would be >> perfectly fine to return a zero length array. So what about the following >> solution: >> >> dcmdInfoCls = (*env)->FindClass(env, >> "sun/management/DiagnosticCommandInfo"); >> num_commands = (*env)->GetArrayLength(env, commands); > > Sorry, of course I wanted to say "if (num_commands == 0)" here! I understood as much :-) > >> if (num_commands = 0) { >> result = (*env)->NewObjectArray(env, 0, dcmdInfoCls, NULL); >> if (result == NULL) { >> JNU_ThrowOutOfMemoryError(env, 0); >> } >> else { >> return result; >> } >> } >> dcmd_info_array = (dcmdInfo*) malloc(num_commands * sizeof(dcmdInfo)); >> if (dcmd_info_array == NULL) { >> JNU_ThrowOutOfMemoryError(env, NULL); >> } >> jmm_interface->GetDiagnosticCommandInfo(env, commands, dcmd_info_array); >> result = (*env)->NewObjectArray(env, num_commands, dcmdInfoCls, NULL); >> >> That seems easier and saves me from handling the exception. >> >> What do you think? >> >>> src/solaris/native/sun/management/OperatingSystemImpl.c >>> No comments. >>> >>> src/share/transport/socket/socketTransport.c >>> No comments. >>> >>> >>> src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider >>> No comments. >>> >>> >>> Thanks, >>> /Staffan >>> >>> >>> >>> On 14 jan 2014, at 09:40, Volker Simonis wrote: >>> >>> Hi, >>> >>> could you please review the following changes for the ppc-aix-port >>> stage/stage-9 repositories (the changes are planned for integration into >>> ppc-aix-port/stage-9 and subsequent backporting to ppc-aix-port/stage): >>> >>> http://cr.openjdk.java.net/~simonis/webrevs/8031581/ >>> >>> I've build and smoke tested without any problems on Linux/x86_64 and >>> PPC64, Windows/x86_64, MacOSX, Solaris/SPARC64 and AIX7PPC64. >>> >>> With these changes (and together with the changes from "8028537: PPC64: >>> Updated jdk/test scripts to understand the AIX os and environment" and >>> "8031134 : PPC64: implement printing on AIX") our port passes all but the >>> following 7 jtreg regression tests on AIX (compared to the Linux/x86_64 >>> baseline from www.java.net/download/jdk8/testresults/testresults.html?): >>> >>> java/net/Inet6Address/B6558853.java >>> java/nio/channels/AsynchronousChannelGroup/Basic.java (sporadically) >>> java/nio/channels/AsynchronousChannelGroup/GroupOfOne.java >>> java/nio/channels/AsynchronousChannelGroup/Unbounded.java (sporadically) >>> java/nio/channels/Selector/RacyDeregister.java >>> sun/security/krb5/auto/Unreachable.java (only on IPv6) >>> >>> Thank you and best regards, >>> Volker >>> >>> >>> Following a detailed description of the various changes: >>> >>> src/share/native/java/util/zip/zip_util.c >>> src/share/native/sun/management/DiagnosticCommandImpl.c >>> >>> According to ISO C it is perfectly legal for malloc to return zero if >>> called with a zero argument. Fix various places where malloc can potentially >>> correctly return zero because it was called with a zero argument. >>> Also fixed DiagnosticCommandImpl.c to include stdlib.h. This only fixes a >>> compiler warning on Linux, but on AIX it prevents a VM crash later on >>> because the return value of malloc() will be casted to int which is >>> especially bad if that pointer was bigger than 32-bit. >>> >>> make/CompileJavaClasses.gmk >>> >>> Also use PollingWatchService on AIX. >>> >>> make/lib/NioLibraries.gmk >>> src/aix/native/sun/nio/ch/AixNativeThread.c >>> >>> Put the implementation for the native methods of NativeThread into >>> AixNativeThread.c on AIX. >>> >>> src/solaris/native/sun/nio/ch/PollArrayWrapper.c >>> src/solaris/native/sun/nio/ch/Net.c >>> src/aix/classes/sun/nio/ch/AixPollPort.java >>> src/aix/native/sun/nio/ch/AixPollPort.c >>> src/aix/native/java/net/aix_close.c >>> >>> On AIX, the constants used for the polling events (i.e. POLLIN, POLLOUT, >>> ...) are defined to different values than on other operating systems. The >>> problem is however, that these constants are hardcoded as public final >>> static members of various, shared Java classes. We therefore have to map >>> them from Java to native every time before calling one of the native poll >>> functions and back to Java after the call on AIX in order to get the right >>> semantics. >>> >>> src/share/classes/java/nio/file/CopyMoveHelper.java >>> >>> As discussed on the core-libs mailing list (see >>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-December/024119.html) >>> it is not necessary to call Files.getFileAttributeView() with any >>> linkOptions because at that place we've already checked that the target file >>> can not be a symbolic link. This change makes the implementation more robust >>> on platforms which support symbolic links but do not support the O_NOFOLLOW >>> flag to the open system call. It also makes the JDK pass the >>> demo/zipfs/basic.sh test on AIX. >>> >>> src/share/classes/sun/nio/cs/ext/ExtendedCharsets.java >>> >>> Support "compound text" on AIX in the same way like on other Unix >>> platforms. >>> >>> >>> src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider >>> >>> Define the correct attach provider for AIX. >>> >>> src/solaris/native/java/net/net_util_md.h >>> src/solaris/native/sun/nio/ch/FileDispatcherImpl.c >>> src/solaris/native/sun/nio/ch/ServerSocketChannelImpl.c >>> >>> AIX needs a workaround for I/O cancellation (see: >>> http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.basetechref/doc/basetrf1/close.htm). >>> "..The close() subroutine is blocked until all subroutines which use the >>> file descriptor return to usr space. For example, when a thread is calling >>> close and another thread is calling select with the same file descriptor, >>> the close subroutine does not return until the select call returns...". To >>> fix this problem, we have to use the various NET_ wrappers which are >>> declared in net_util_md.h and defined in aix_close.c and we also need some >>> additional wrappers for fcntl(), read() and write() on AIX. >>> While the current solution isn't really nice because it introduces some >>> more AIX-specifc sections in shared code, I think it is the best way to go >>> for JDK 8 because it imposes the smallest possible changes and risks for the >>> existing platforms. I'm ready to change the code to unconditionally use the >>> wrappers for all platforms and implement the wrappers empty on platforms >>> which don't need any wrapping. I think it would also be nice to clean up the >>> names (e.g. NET_Read() is currently a wrapper for recv() and the NET_ prefix >>> is probably not appropriate any more so maybe change it to something like >>> IO_). But again, I'll prefer to keep that as a follow up change for JDK9. >>> Calling fsync() on a "read-only" file descriptor on AIX will result in an >>> error (i.e. "EBADF: The FileDescriptor parameter is not a valid file >>> descriptor open for writing."). To prevent this error we have to query if >>> the corresponding file descriptor is writeable. Notice that at that point we >>> can not access the writable attribute of the corresponding file channel so >>> we have to use fcntl(). >>> >>> src/solaris/classes/java/lang/UNIXProcess.java.aix >>> >>> On AIX the implementation is especially tricky, because the close() system >>> call will block if another thread is at the same time blocked in a file >>> operation (e.g. 'read()') on the same file descriptor. We therefore combine >>> the AIX ProcessPipeInputStream implemenatation with the >>> DeferredCloseInputStream approach used on Solaris (see >>> UNIXProcess.java.solaris). This means that every potentially blocking >>> operation on the file descriptor increments a counter before it is executed >>> and decrements it once it finishes. The 'close()' operation will only be >>> executed if there are no pending operations. Otherwise it is deferred after >>> the last pending operation has finished. >>> >>> src/share/transport/socket/socketTransport.c >>> >>> On AIX we have to call shutdown() on a socket descriptor before closing >>> it, otherwise the close() call may be blocked. This is the same problem as >>> described before. Unfortunately the JDI framework doesn't use the same IO >>> wrappers like other class library components so we can not easily use the >>> NET_ abstractions from aix_close.c here. >>> Without this small change all JDI regression tests will fail on AIX >>> because of the way how the tests act as a "debugger" which launches another >>> VM (the "debugge") which connects itself back to the debugger. In this >>> scenario the "debugge" can not shut down itself because one thread will >>> always be blocked in the close() call on one of the communication sockets. >>> >>> src/solaris/native/java/net/NetworkInterface.c >>> >>> Set the scope identifier for IPv6 addresses on AIX. >>> >>> src/solaris/native/java/net/net_util_md.c >>> >>> It turns out that we do not always have to replace SO_REUSEADDR on AIX by >>> SO_REUSEPORT. Instead we can simply use the same approach like BSD and only >>> use SO_REUSEPORT additionally, if several datagram sockets try to bind to >>> the same port. >>> Also fixed a comment and removed unused local variables. >>> Fixed the obviously inverted assignment newTime = prevTime; which should >>> read prevTime = newTime;. Otherwise prevTime will never change and the >>> timeout will be potential reached too fast. >>> >>> src/solaris/native/sun/management/OperatingSystemImpl.c >>> >>> AIX does not understand /proc/self so we have to query the real process ID >>> to access the proc file system. >>> >>> src/solaris/native/sun/nio/ch/DatagramChannelImpl.c >>> >>> On AIX, connect() may legally return EAFNOSUPPORT if called on a socket >>> with the address family set to AF_UNSPEC. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140115/c93b7945/attachment-0001.html From staffan.larsen at oracle.com Wed Jan 15 11:04:50 2014 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Wed, 15 Jan 2014 20:04:50 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: <94ABE61E-10BC-407E-92D0-B528165F3460@oracle.com> Message-ID: <5F6D2785-23EF-4F43-9E0E-649BEF50F204@oracle.com> On 15 jan 2014, at 18:52, Volker Simonis wrote: > > > On Wed, Jan 15, 2014 at 6:27 PM, Volker Simonis wrote: > > On Wed, Jan 15, 2014 at 5:34 PM, Volker Simonis > > wrote: > >> Hi Staffan, > >> > >> thanks for the review. Please find my comments inline: > >> > >> On Wed, Jan 15, 2014 at 9:57 AM, Staffan Larsen > >> wrote: > >>> > >>> Volker, > >>> > >>> I?ve look at the following files: > >>> > >>> src/share/native/sun/management/DiagnosticCommandImpl.c: > >>> nit: ?legel? -> ?legal? (two times) > >>> In Java_sun_management_DiagnosticCommandImpl_getDiagnosticCommandInfo() if > >>> you allow dcmd_info_array to become NULL, then > >>> jmm_interface->GetDiagnosticCommandInfo() will throw an NPE and you need to > >>> check that. > >> > >> > >> Good catch. I actually had problems with malloc returning NULL in > >> 'getDiagnosticCommandArgumentInfoArray()' and then changed all other > >> potentially dangerous locations which used the same pattern. > >> > >> However I think if the 'dcmd_info_array' has zero length it would be > >> perfectly fine to return a zero length array. So what about the following > >> solution: > >> > > Sorry for the noise - it seems I was a little indisposed during the last mails:) > So this is the simple change I'd like to propose for Java_sun_management_DiagnosticCommandImpl_getDiagnosticCommandInfo(): > > > @@ -117,19 +119,23 @@ > return NULL; > } > num_commands = (*env)->GetArrayLength(env, commands); > - dcmd_info_array = (dcmdInfo*) malloc(num_commands * > - sizeof(dcmdInfo)); > + dcmdInfoCls = (*env)->FindClass(env, > + "sun/management/DiagnosticCommandInfo"); > + result = (*env)->NewObjectArray(env, num_commands, dcmdInfoCls, NULL); > + if (result == NULL) { > + JNU_ThrowOutOfMemoryError(env, 0); > + } > + if (num_commands == 0) { > + /* Handle the 'zero commands' case specially to avoid calling 'malloc()' */ > + /* with a zero argument because that may legally return a NULL pointer. */ > + return result; > + } > + dcmd_info_array = (dcmdInfo*) malloc(num_commands * sizeof(dcmdInfo)); > if (dcmd_info_array == NULL) { > JNU_ThrowOutOfMemoryError(env, NULL); > } > jmm_interface->GetDiagnosticCommandInfo(env, commands, dcmd_info_array); > - dcmdInfoCls = (*env)->FindClass(env, > - "sun/management/DiagnosticCommandInfo"); > - result = (*env)->NewObjectArray(env, num_commands, dcmdInfoCls, NULL); > - if (result == NULL) { > - free(dcmd_info_array); > - JNU_ThrowOutOfMemoryError(env, 0); > - } > for (i=0; i args = getDiagnosticCommandArgumentInfoArray(env, > (*env)->GetObjectArrayElement(env,commands,i), > > If the 'commands' input array is of zero length just return a zero length array. > OK? Yes, this looks good (still :-) ) /Staffan > > >> dcmdInfoCls = (*env)->FindClass(env, > >> "sun/management/DiagnosticCommandInfo"); > >> num_commands = (*env)->GetArrayLength(env, commands); > > > > Sorry, of course I wanted to say "if (num_commands == 0)" here! > > > >> if (num_commands = 0) { > >> result = (*env)->NewObjectArray(env, 0, dcmdInfoCls, NULL); > >> if (result == NULL) { > >> JNU_ThrowOutOfMemoryError(env, 0); > >> } > >> else { > >> return result; > >> } > >> } > >> dcmd_info_array = (dcmdInfo*) malloc(num_commands * sizeof(dcmdInfo)); > >> if (dcmd_info_array == NULL) { > >> JNU_ThrowOutOfMemoryError(env, NULL); > >> } > >> jmm_interface->GetDiagnosticCommandInfo(env, commands, dcmd_info_array); > >> result = (*env)->NewObjectArray(env, num_commands, dcmdInfoCls, NULL); > >> > >> That seems easier and saves me from handling the exception. > >> > >> What do you think? > >> > >>> src/solaris/native/sun/management/OperatingSystemImpl.c > >>> No comments. > >>> > >>> src/share/transport/socket/socketTransport.c > >>> No comments. > >>> > >>> > >>> src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider > >>> No comments. > >>> > >>> > >>> Thanks, > >>> /Staffan > >>> > >>> > >>> > >>> On 14 jan 2014, at 09:40, Volker Simonis wrote: > >>> > >>> Hi, > >>> > >>> could you please review the following changes for the ppc-aix-port > >>> stage/stage-9 repositories (the changes are planned for integration into > >>> ppc-aix-port/stage-9 and subsequent backporting to ppc-aix-port/stage): > >>> > >>> http://cr.openjdk.java.net/~simonis/webrevs/8031581/ > >>> > >>> I've build and smoke tested without any problems on Linux/x86_64 and > >>> PPC64, Windows/x86_64, MacOSX, Solaris/SPARC64 and AIX7PPC64. > >>> > >>> With these changes (and together with the changes from "8028537: PPC64: > >>> Updated jdk/test scripts to understand the AIX os and environment" and > >>> "8031134 : PPC64: implement printing on AIX") our port passes all but the > >>> following 7 jtreg regression tests on AIX (compared to the Linux/x86_64 > >>> baseline from www.java.net/download/jdk8/testresults/testresults.html?): > >>> > >>> java/net/Inet6Address/B6558853.java > >>> java/nio/channels/AsynchronousChannelGroup/Basic.java (sporadically) > >>> java/nio/channels/AsynchronousChannelGroup/GroupOfOne.java > >>> java/nio/channels/AsynchronousChannelGroup/Unbounded.java (sporadically) > >>> java/nio/channels/Selector/RacyDeregister.java > >>> sun/security/krb5/auto/Unreachable.java (only on IPv6) > >>> > >>> Thank you and best regards, > >>> Volker > >>> > >>> > >>> Following a detailed description of the various changes: > >>> > >>> src/share/native/java/util/zip/zip_util.c > >>> src/share/native/sun/management/DiagnosticCommandImpl.c > >>> > >>> According to ISO C it is perfectly legal for malloc to return zero if > >>> called with a zero argument. Fix various places where malloc can potentially > >>> correctly return zero because it was called with a zero argument. > >>> Also fixed DiagnosticCommandImpl.c to include stdlib.h. This only fixes a > >>> compiler warning on Linux, but on AIX it prevents a VM crash later on > >>> because the return value of malloc() will be casted to int which is > >>> especially bad if that pointer was bigger than 32-bit. > >>> > >>> make/CompileJavaClasses.gmk > >>> > >>> Also use PollingWatchService on AIX. > >>> > >>> make/lib/NioLibraries.gmk > >>> src/aix/native/sun/nio/ch/AixNativeThread.c > >>> > >>> Put the implementation for the native methods of NativeThread into > >>> AixNativeThread.c on AIX. > >>> > >>> src/solaris/native/sun/nio/ch/PollArrayWrapper.c > >>> src/solaris/native/sun/nio/ch/Net.c > >>> src/aix/classes/sun/nio/ch/AixPollPort.java > >>> src/aix/native/sun/nio/ch/AixPollPort.c > >>> src/aix/native/java/net/aix_close.c > >>> > >>> On AIX, the constants used for the polling events (i.e. POLLIN, POLLOUT, > >>> ...) are defined to different values than on other operating systems. The > >>> problem is however, that these constants are hardcoded as public final > >>> static members of various, shared Java classes. We therefore have to map > >>> them from Java to native every time before calling one of the native poll > >>> functions and back to Java after the call on AIX in order to get the right > >>> semantics. > >>> > >>> src/share/classes/java/nio/file/CopyMoveHelper.java > >>> > >>> As discussed on the core-libs mailing list (see > >>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-December/024119.html) > >>> it is not necessary to call Files.getFileAttributeView() with any > >>> linkOptions because at that place we've already checked that the target file > >>> can not be a symbolic link. This change makes the implementation more robust > >>> on platforms which support symbolic links but do not support the O_NOFOLLOW > >>> flag to the open system call. It also makes the JDK pass the > >>> demo/zipfs/basic.sh test on AIX. > >>> > >>> src/share/classes/sun/nio/cs/ext/ExtendedCharsets.java > >>> > >>> Support "compound text" on AIX in the same way like on other Unix > >>> platforms. > >>> > >>> > >>> src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider > >>> > >>> Define the correct attach provider for AIX. > >>> > >>> src/solaris/native/java/net/net_util_md.h > >>> src/solaris/native/sun/nio/ch/FileDispatcherImpl.c > >>> src/solaris/native/sun/nio/ch/ServerSocketChannelImpl.c > >>> > >>> AIX needs a workaround for I/O cancellation (see: > >>> http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.basetechref/doc/basetrf1/close.htm). > >>> "..The close() subroutine is blocked until all subroutines which use the > >>> file descriptor return to usr space. For example, when a thread is calling > >>> close and another thread is calling select with the same file descriptor, > >>> the close subroutine does not return until the select call returns...". To > >>> fix this problem, we have to use the various NET_ wrappers which are > >>> declared in net_util_md.h and defined in aix_close.c and we also need some > >>> additional wrappers for fcntl(), read() and write() on AIX. > >>> While the current solution isn't really nice because it introduces some > >>> more AIX-specifc sections in shared code, I think it is the best way to go > >>> for JDK 8 because it imposes the smallest possible changes and risks for the > >>> existing platforms. I'm ready to change the code to unconditionally use the > >>> wrappers for all platforms and implement the wrappers empty on platforms > >>> which don't need any wrapping. I think it would also be nice to clean up the > >>> names (e.g. NET_Read() is currently a wrapper for recv() and the NET_ prefix > >>> is probably not appropriate any more so maybe change it to something like > >>> IO_). But again, I'll prefer to keep that as a follow up change for JDK9. > >>> Calling fsync() on a "read-only" file descriptor on AIX will result in an > >>> error (i.e. "EBADF: The FileDescriptor parameter is not a valid file > >>> descriptor open for writing."). To prevent this error we have to query if > >>> the corresponding file descriptor is writeable. Notice that at that point we > >>> can not access the writable attribute of the corresponding file channel so > >>> we have to use fcntl(). > >>> > >>> src/solaris/classes/java/lang/UNIXProcess.java.aix > >>> > >>> On AIX the implementation is especially tricky, because the close() system > >>> call will block if another thread is at the same time blocked in a file > >>> operation (e.g. 'read()') on the same file descriptor. We therefore combine > >>> the AIX ProcessPipeInputStream implemenatation with the > >>> DeferredCloseInputStream approach used on Solaris (see > >>> UNIXProcess.java.solaris). This means that every potentially blocking > >>> operation on the file descriptor increments a counter before it is executed > >>> and decrements it once it finishes. The 'close()' operation will only be > >>> executed if there are no pending operations. Otherwise it is deferred after > >>> the last pending operation has finished. > >>> > >>> src/share/transport/socket/socketTransport.c > >>> > >>> On AIX we have to call shutdown() on a socket descriptor before closing > >>> it, otherwise the close() call may be blocked. This is the same problem as > >>> described before. Unfortunately the JDI framework doesn't use the same IO > >>> wrappers like other class library components so we can not easily use the > >>> NET_ abstractions from aix_close.c here. > >>> Without this small change all JDI regression tests will fail on AIX > >>> because of the way how the tests act as a "debugger" which launches another > >>> VM (the "debugge") which connects itself back to the debugger. In this > >>> scenario the "debugge" can not shut down itself because one thread will > >>> always be blocked in the close() call on one of the communication sockets. > >>> > >>> src/solaris/native/java/net/NetworkInterface.c > >>> > >>> Set the scope identifier for IPv6 addresses on AIX. > >>> > >>> src/solaris/native/java/net/net_util_md.c > >>> > >>> It turns out that we do not always have to replace SO_REUSEADDR on AIX by > >>> SO_REUSEPORT. Instead we can simply use the same approach like BSD and only > >>> use SO_REUSEPORT additionally, if several datagram sockets try to bind to > >>> the same port. > >>> Also fixed a comment and removed unused local variables. > >>> Fixed the obviously inverted assignment newTime = prevTime; which should > >>> read prevTime = newTime;. Otherwise prevTime will never change and the > >>> timeout will be potential reached too fast. > >>> > >>> src/solaris/native/sun/management/OperatingSystemImpl.c > >>> > >>> AIX does not understand /proc/self so we have to query the real process ID > >>> to access the proc file system. > >>> > >>> src/solaris/native/sun/nio/ch/DatagramChannelImpl.c > >>> > >>> On AIX, connect() may legally return EAFNOSUPPORT if called on a socket > >>> with the address family set to AF_UNSPEC. > >>> > >>> > >>> > >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140115/c4ae36c9/attachment-0001.html From david.holmes at oracle.com Wed Jan 15 19:32:24 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 16 Jan 2014 13:32:24 +1000 Subject: RFR (M): 8029396: PPC64 (part 212): Several memory ordering fixes in C-code. In-Reply-To: <52B39E70.5020500@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE6DA2E@DEWDFEMB12A.global.corp.sap> <52B2CFD3.3090303@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE71DA3@DEWDFEMB12A.global.corp.sap> <52B39E70.5020500@oracle.com> Message-ID: <52D752C8.6020502@oracle.com> I had overlooked the fact that these changes had been pushed and was awaiting further discussion. :( I am concerned about the thread state related changes as they are potentially/likely redundant. Thread state transitions generally involve full memory fences as part of the transition. Concerns about store ordering in the lead up to that, eg _last_java_frame, need to be examined in detail to see where the variable is actually read. In many cases you will find that variables are only read in conditions where there has already been explicit synchronization between the threads involved - eg at a safepoint, or when a thread has been suspended etc. Plus I ask again: > With regards to this part of the code do you force UseMembar true for > PPC64? The memory serialization page mechanism is not reliable for > non-TSO systems. Cheers, David ----- On 20/12/2013 11:33 AM, David Holmes wrote: > Hi Goetz, > > On 20/12/2013 12:19 AM, Lindenmaier, Goetz wrote: >> Hi David, >> >> the GC stuff is only called from shared code. > > Good to hear. I always wonder though whether the cost of passing the > extra parameter through and checking it, outweighs the benefit of not > issuing the action (the release in this case)? I'm not a compiler person > but perhaps the extra parameter forces parameter passing via the stack > rather than registers, or changes an inlining decision, or maybe the > additional control flow check causes a problem ... Any hard data that > not using release semantics all the time actually yields a benefit? > >> The ordering in BiasedLocking is needed, e.g., in the context of >> force_revoke_and_rebias. >> If an other thread wants to inflate the lock written to the mark word >> in force_revoke_and_rebias, it must be assured that changing the >> displaced >> header is visible to that other thread. > > I'll take your word for it. I don't have time to try and analyse the > BiasedLocking code in depth and I don't think it is a performance issue > for that code given the potentially redundant barrier occurs during a > safepoint anyway. > >> We added the memory barriers for the _thread_state field in 2006 and can >> not reconstruct the concrete cause. But things as setting the >> last_java_frame >> and then the state to in_native should be ordered. > > I am much more concerned about this. I can't accept a simple wrapping of > all accesses with acquire/release semantics because there may be a few > cases where it is needed - this code is too hot for that. We need to > examine the individual cases, like last_java_frame, where you think > there is an issue. > > With regards to this part of the code do you force UseMembar true for > PPC64? The memory serialization page mechanism is not reliable for > non-TSO systems. > > My Xmas break begins in about 5 hours but I will be checking in on email > at times :) > > Cheers, > David > >> Best regards, >> Goetz. >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Donnerstag, 19. Dezember 2013 11:52 >> To: Lindenmaier, Goetz >> Cc: 'hotspot-dev at openjdk.java.net'; >> 'ppc-aix-port-dev at openjdk.java.net'; Vladimir Kozlov >> Subject: Re: RFR (M): 8029396: PPC64 (part 212): Several memory >> ordering fixes in C-code. >> >> Somewhat late but I was away for two weeks. >> >> GC stuff: >> >> Is the use of the new release parameter always set to true from shared >> code? If so I don't have a problem with it being used to optimize the >> stores under certain conditions. But if pcc64 will pass true where other >> archs won't then again I object to this because it should be an >> algorithmic correctness requirement that is always present. >> >> >> General: I find a lot of the commentary excessively platform specific. >> Eg We don't expect any C++ compiler we currently use to emit memory >> barriers for C++ volatiles. If they start doing that we will have a >> bazillion unnecessary injected barriers in our code! >> >> BiasedLocking: >> >> It is not clear to me that the BiasedLocking change is needed. AFAICS >> there is only one path where revoke_bias is not called at a safepoint >> and the comments around that seem to indicate that it was considered >> safe to do so. It may be they assumed TSO when making that decision but >> I'd be interested to know how this change was arrived at. >> >> Thread state: >> >> The thread state changes have me most concerned as they are heavily used >> and so the performance impact here could be extensive. Many of them >> occur on paths that involve membars or membar-equivalent actions so they >> would seem redundant then. Again I would like to see some analysis >> showing these are in fact essential for correctness. There may well be >> some situations where they are, but to me this seems an even better >> candidate for adding the "release" parameter when needed! >> >> David >> ----- >> >> On 3/12/2013 2:51 AM, Lindenmaier, Goetz wrote: >>> Hi, >>> >>> This change contains a row of fixes to the memory ordering in >>> runtime, GC etc. >>> http://cr.openjdk.java.net/~goetz/webrevs/8029396-0-memo/ >>> >>> These are: >>> - Accessing arrays in CMS (compactibleFreeListSpace.cpp) >>> - CMS: Do release when marking a card dirty. The release must only be >>> done if GC is running. (several files) >>> - Method counter initialization (method.hpp). >>> - Order accessing f1/f2 in constant pool cache. >>> - Release stores in OopMapCache constructor (instanceKLass.cpp). >>> - BiasedLocking: Release setting object header to displaced mark. >>> - Release state of nmethod sweeper (sweeper.cpp). >>> - Do barriers when writing the thread state (thread.hpp). >>> >>> Please review and test this change. >>> >>> If requested, I can part this into smaller changes. But for now >>> I wanted to put them all into one change as they all address the >>> problems with the PPC memory model. >>> >>> Best regards, >>> Goetz. >>> From david.holmes at oracle.com Wed Jan 15 21:25:36 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 16 Jan 2014 15:25:36 +1000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293F087.2080700@oracle.com> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> Message-ID: <52D76D50.60700@oracle.com> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: > Hi David, > > I updated the webrev: > http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ > > - I removed the IRIW example in parse3.cpp > - I adapted the comments not to point to that comment, and to > reflect the new flagging. Also I mention that we support the > volatile constructor issue, but that it's not standard. > - I protected issuing the barrier for the constructor by PPC64. > I also think it's better to separate these this way. Sorry if I wasn't clear but I'd like the wrote_volatile field declaration and all uses to be guarded by ifdef PPC64 too please. One nit I missed before. In src/share/vm/opto/library_call.cpp this comment doesn't make much sense to me and refers to ppc specific stuff in a shared file: if (is_volatile) { ! if (!is_store) { insert_mem_bar(Op_MemBarAcquire); ! } else { ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. insert_mem_bar(Op_MemBarVolatile); + #endif + } I don't think the comment is needed. Thanks, David > Thanks for your comments! > > Best regards, > Goetz. > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Mittwoch, 15. Januar 2014 01:55 > To: Lindenmaier, Goetz > Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > Hi Goetz, > > Sorry for the delay in getting back to this. > > The general changes to the volatile barriers to support IRIW are okay. > The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more > specifically it is > not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of > the commentary excessive, particularly for shared code. In particular > the IRIW example in parse3.cpp - it seems a strange place to give the > explanation and I don't think we need it to that level of detail. Seems > to me that is present is globalDefinitions_ppc.hpp is quite adequate. > > The changes related to volatile writes in the constructor, as discussed > are not required by the Java Memory Model. If you want to keep these > then I think they should all be guarded with PPC64 because it is not > related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the > PPC64 porters. > > Thanks, > David > > On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >> Hi, >> >> I updated this webrev. I detected a small flaw I made when editing this version. >> The #endif in line 322, parse3.cpp was in the wrong line. >> I also based the webrev on the latest version of the stage repo. >> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >> >> Best regards, >> Goetz. >> >> -----Original Message----- >> From: Lindenmaier, Goetz >> Sent: Freitag, 20. Dezember 2013 13:47 >> To: David Holmes >> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> Hi David, >> >>> So we can at least undo #4 now we have established those tests were not >>> required to pass. >> We would prefer if we could keep this in. We want to avoid that it's >> blamed on the VM if java programs are failing on PPC after they worked >> on x86. To clearly mark it as overfulfilling the spec I would guard it by >> a flag as proposed. But if you insist I will remove it. Also, this part is >> not that performance relevant. >> >>> A compile-time guard (ifdef) would be better than a runtime one I think >> I added a compile-time guard in this new webrev: >> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >> several double negations I don't like, (#ifNdef CPU_NOT_MULTIPLE_COPY_ATOMIC) >> but this way I only have to change the ppc platform. >> >> Best regards, >> Goetz >> >> P.S.: I will also be available over the Christmas period. >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Freitag, 20. Dezember 2013 05:58 >> To: Lindenmaier, Goetz >> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> Sorry for the delay, it takes a while to catch up after two weeks >> vacation :) Next vacation (ie next two weeks) I'll continue to check emails. >> >> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>> Hi, >>> >>> ok, I understand the tests are wrong. It's good this issue is settled. >>> Thanks Aleksey and Andreas for going into the details of the proof! >>> >>> About our change: David, the causality is the other way round. >>> The change is about IRIW. >>> 1. To pass IRIW, we must use sync instructions before loads. >> >> This is the part I still have some question marks over as the >> implications are not nice for performance on non-TSO platforms. But I'm >> no further along in processing that paper I'm afraid. >> >>> 2. If we do syncs before loads, we don't need to do them after stores. >>> 3. If we don't do them after stores, we fail the volatile constructor tests. >>> 4. So finally we added them again at the end of the constructor after stores >>> to pass the volatile constructor tests. >> >> So we can at least undo #4 now we have established those tests were not >> required to pass. >> >>> We originally passed the constructor tests because the ppc memory order >>> instructions are not as find-granular as the >>> operations in the IR. MemBarVolatile is specified as StoreLoad. The only instruction >>> on PPC that does StoreLoad is sync. But sync also does StoreStore, therefore the >>> MemBarVolatile after the store fixes the constructor tests. The proper representation >>> of the fix in the IR would be adding a MemBarStoreStore. But now it's pointless >>> anyways. >>> >>>> I'm not happy with the ifdef approach but I won't block it. >>> I'd be happy to add a property >>> OrderAccess::cpu_is_multiple_copy_atomic() >> >> A compile-time guard (ifdef) would be better than a runtime one I think >> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >> based not architecture based) as that will allows for turning this >> on/off for any architecture for testing purposes. >> >> Thanks, >> David >> >>> or the like to guard the customization. I'd like that much better. Or also >>> OrderAccess::needs_support_iriw_ordering() >>> VM_Version::needs_support_iriw_ordering() >>> >>> >>> Best regards, >>> Goetz. >>> >>> >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Donnerstag, 28. November 2013 00:34 >>> To: Lindenmaier, Goetz >>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>> >>> TL;DR version: >>> >>> Discussion on the c-i list has now confirmed that a constructor-barrier >>> for volatiles is not required as part of the JMM specification. It *may* >>> be required in an implementation that doesn't pre-zero memory to ensure >>> you can't see uninitialized fields. So the tests for this are invalid >>> and this part of the patch is not needed in general (ppc64 may need it >>> due to other factors). >>> >>> Re: "multiple copy atomicity" - first thanks for correcting the term :) >>> Second thanks for the reference to that paper! For reference: >>> >>> "The memory system (perhaps involving a hierarchy of buffers and a >>> complex interconnect) does not guarantee that a write becomes visible to >>> all other hardware threads at the same time point; these architectures >>> are not multiple-copy atomic." >>> >>> This is the visibility issue that I referred to and affects both ARM and >>> PPC. But of course it is normally handled by using suitable barriers >>> after the stores that need to be visible. I think the crux of the >>> current issue is what you wrote below: >>> >>> > The fixes for the constructor issue are only needed because we >>> > remove the sync instruction from behind stores (parse3.cpp:320) >>> > and place it before loads. >>> >>> I hadn't grasped this part. Obviously if you fail to do the sync after >>> the store then you have to do something around the loads to get the same >>> results! I still don't know what lead you to the conclusion that the >>> only way to fix the IRIW issue was to put the fence before the load - >>> maybe when I get the chance to read that paper in full it will be clearer. >>> >>> So ... the basic problem is that the current structure in the VM has >>> hard-wired one choice of how to get the right semantics for volatile >>> variables. You now want to customize that but not all the requisite >>> hooks are present. It would be better if volatile_load and >>> volatile_store were factored out so that they could be implemented as >>> desired per-platform. Alternatively there could be pre- and post- hooks >>> that could then be customized per platform. Otherwise you need >>> platform-specific ifdef's to handle it as per your patch. >>> >>> I'm not happy with the ifdef approach but I won't block it. I think this >>> is an area where a lot of clean up is needed in the VM. The barrier >>> abstractions are a confused mess in my opinion. >>> >>> Thanks, >>> David >>> ----- >>> >>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>> Hi, >>>> >>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>> >>>> I did not yet add the >>>> OrderAccess::needs_support_iriw_ordering() >>>> VM_Version::needs_support_iriw_ordering() >>>> or >>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>> to reduce #defined, as I got no further comment on that. >>>> >>>> >>>> WRT to the validity of the tests and the interpretation of the JMM >>>> I feel not in the position to contribute substantially. >>>> >>>> But we would like to pass the torture test suite as we consider >>>> this a substantial task in implementing a PPC port. Also we think >>>> both tests show behavior a programmer would expect. It's bad if >>>> Java code runs fine on the more common x86 platform, and then >>>> fails on ppc. This will always first be blamed on the VM. >>>> >>>> The fixes for the constructor issue are only needed because we >>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>> and place it before loads. Then there is no sync between volatile store >>>> and publishing the object. So we add it again in this one case >>>> (volatile store in constructor). >>>> >>>> >>>> @David >>>>>> Sure. There also is no solution as you require for the taskqueue problem yet, >>>>>> and that's being discussed now for almost a year. >>>>> It may have started a year ago but work on it has hardly been continuous. >>>> That's not true, we did a lot of investigation and testing on this issue. >>>> And we came up with a solution we consider the best possible. If you >>>> have objections, you should at least give the draft of a better solution, >>>> we would volunteer to implement and test it. >>>> Similarly, we invested time in fixing the concurrency torture issues. >>>> >>>> @David >>>>> What is "multiple-read-atomicity"? I'm not familiar with the term and >>>>> can't find any reference to it. >>>> We learned about this reading "A Tutorial Introduction to the ARM and >>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>> Peter Sewell, which is cited in "Correct and Efficient Work-Stealing for >>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the taskqueue problem. >>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>> >>>> I was wrong in one thing, it's called multiple copy atomicity, I used 'read' >>>> instead. Sorry for that. (I also fixed that in the method name above). >>>> >>>> Best regards and thanks for all your involvements, >>>> Goetz. >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Mittwoch, 27. November 2013 12:53 >>>> To: Lindenmaier, Goetz >>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>> >>>> Hi Goetz, >>>> >>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>> Hi David, >>>>> >>>>> -- Volatile in constuctor >>>>>> AFAIK we have not seen those tests fail due to a >>>>>> missing constructor barrier. >>>>> We see them on PPC64. Our test machines have typically 8-32 processors >>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks Aleksey!) >>>> >>>> And see follow ups - the tests are invalid. >>>> >>>>> -- IRIW issue >>>>>> I can not possibly answer to the necessary level of detail with a few >>>>>> moments thought. >>>>> Sure. There also is no solution as you require for the taskqueue problem yet, >>>>> and that's being discussed now for almost a year. >>>> >>>> It may have started a year ago but work on it has hardly been continuous. >>>> >>>>>> You are implying there is a problem here that will >>>>>> impact numerous platforms (unless you can tell me why ppc is so different?) >>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore I contributed a >>>>> solution with the #defines, and that's correct for all, but not nice, I admit. >>>>> (I don't really know about ARM, though). >>>>> So if I can write down a nicer solution testing for methods that are evaluated >>>>> by the C-compiler I'm happy. >>>>> >>>>> The problem is not that IRIW is not handled by the JMM, the problem >>>>> is that >>>>> store >>>>> sync >>>>> does not assure multiple-read-atomicity, >>>>> only >>>>> sync >>>>> load >>>>> does so on PPC. And you require multiple-read-atomicity to >>>>> pass that test. >>>> >>>> What is "multiple-read-atomicity"? I'm not familiar with the term and >>>> can't find any reference to it. >>>> >>>> Thanks, >>>> David >>>> >>>> The JMM is fine. And >>>>> store >>>>> MemBarVolatile >>>>> is fine on x86, sparc etc. as there exist assembler instructions that >>>>> do what is required. >>>>> >>>>> So if you are off soon, please let's come to a solution that >>>>> might be improvable in the way it's implemented, but that >>>>> allows us to implement a correct PPC64 port. >>>>> >>>>> Best regards, >>>>> Goetz. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>> To: Lindenmaier, Goetz >>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>>> >>>>> Hi Goetz, >>>>> >>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>> Hi everybody, >>>>>> >>>>>> thanks a lot for the detailed reviews! >>>>>> I'll try to answer to all in one mail. >>>>>> >>>>>>> Volatile fields written in constructor aren't guaranteed by JMM to occur before the reference is assigned; >>>>>> We don't think it's correct if we omit the barrier after initializing >>>>>> a volatile field. Previously, we discussed this with Aleksey Shipilev >>>>>> and Doug Lea, and they agreed. >>>>>> Also, concurrency torture tests >>>>>> LongVolatileTest >>>>>> AtomicIntegerInitialValueTest >>>>>> will fail. >>>>>> (In addition, observing 0 instead of the inital value of a volatile field would be >>>>>> very counter-intuitive for Java programmers, especially in AtomicInteger.) >>>>> >>>>> The affects of unsafe publication are always surprising - volatiles do >>>>> not add anything special here. AFAIK there is nothing in the JMM that >>>>> requires the constructor barrier - discussions with Doug and Aleksey >>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>> missing constructor barrier. >>>>> >>>>>>> proposed for PPC64 is to make volatile reads extremely heavyweight >>>>>> Yes, it costs measurable performance. But else it is wrong. We don't >>>>>> see a way to implement this cheaper. >>>>>> >>>>>>> - these algorithms should be expressed using the correct OrderAccess operations >>>>>> Basically, I agree on this. But you also have to take into account >>>>>> that due to the different memory ordering instructions on different platforms >>>>>> just implementing something empty is not sufficient. >>>>>> An example: >>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>> MemBarVolatile // means StoreLoad barrier >>>>>> If these are consecutively in the code, sparc code looks like this: >>>>>> MemBarRelease --> membar(Assembler::LoadStore | Assembler::StoreStore) >>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>> Just doing what is required. >>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>> fine grained operations: >>>>>> MemBarRelease --> lwsync // Doing LoadStore, StoreStore, LoadLoad >>>>>> MemBarVolatile --> sync // // Doing LoadStore, StoreStore, LoadLoad, StoreLoad >>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations are more (too) powerful, >>>>>> I need an additional optimization that removes the lwsync. I can not implement >>>>>> MemBarRelease empty, as it is also used independently. >>>>>> >>>>>> Back to the IRIW problem. I think here we have a comparable issue. >>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the read >>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>> >>>>>> I would propose to guard the code by >>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>> Else, David, how would you propose to implement this platform independent? >>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>> >>>>> I can not possibly answer to the necessary level of detail with a few >>>>> moments thought. You are implying there is a problem here that will >>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>> different?) and I can not take that on face value at the moment. The >>>>> only reason I can see IRIW not being handled by the JMM requirements for >>>>> volatile accesses is if there are global visibility issues that are not >>>>> addressed - but even then I would expect heavy barriers at the store >>>>> would deal with that, not at the load. (This situation reminds me of the >>>>> need for read-barriers on Alpha architecture due to the use of software >>>>> cache-coherency rather than hardware cache-coherency - but we don't have >>>>> that on ppc!) >>>>> >>>>> Sorry - There is no quick resolution here and in a couple of days I will >>>>> be heading out on vacation for two weeks. >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> Best regards, >>>>>> Goetz. >>>>>> >>>>>> -- Other ports: >>>>>> The IRIW issue requires at least 3 processors to be relevant, so it might >>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>> of PPC64_ONLY if you request so (and if we don't get rid of them). >>>>>> >>>>>> -- MemBarStoreStore after initialization >>>>>> I agree we should not change it in the ppc port. If you wish, I can >>>>>> prepare an extra webrev for hotspot-comp. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>> To: Vladimir Kozlov >>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>>>> >>>>>> Okay this is my second attempt at answering this in a reasonable way :) >>>>>> >>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>> I have to ask David to do correctness evaluation. >>>>>> >>>>>> From what I understand what we see here is an attempt to fix an >>>>>> existing issue with the implementation of volatiles so that the IRIW >>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>> volatile reads extremely heavyweight by adding a fence() when doing the >>>>>> load. >>>>>> >>>>>> Now if this was purely handled in ppc64 source code then I would be >>>>>> happy to let them do whatever they like (surely this kills performance >>>>>> though!). But I do not agree with the changes to the shared code that >>>>>> allow this solution to be implemented - even with PPC64_ONLY this is >>>>>> polluting the shared code. My concern is similar to what I said with the >>>>>> taskQueue changes - these algorithms should be expressed using the >>>>>> correct OrderAccess operations to guarantee the desired properties >>>>>> independent of architecture. If such a "barrier" is not needed on a >>>>>> given architecture then the implementation in OrderAccess should reduce >>>>>> to a no-op. >>>>>> >>>>>> And as Vitaly points out the constructor barriers are not needed under >>>>>> the JMM. >>>>>> >>>>>>> I am fine with suggested changes because you did not change our current >>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>> But may be it should be done using more general query which is set >>>>>>> depending on platform: >>>>>>> >>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>> >>>>>>> or similar to what we use now: >>>>>>> >>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>> >>>>>> Every platform has to support IRIW this is simply part of the Java >>>>>> Memory Model, there should not be any need to call this out explicitly >>>>>> like this. >>>>>> >>>>>> Is there some subtlety of the hardware I am missing here? Are there >>>>>> visibility issues beyond the ordering constraints that the JMM defines? >>>>>>> From what I understand our ppc port is also affected. David? >>>>>> >>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> In library_call.cpp can you add {}? New comment should be inside else {}. >>>>>>> >>>>>>> I think you should make _wrote_volatile field not ppc64 specific which >>>>>>> will be set to 'true' only on ppc64. Then you will not need PPC64_ONLY() >>>>>>> except in do_put_xxx() where it is set to true. Too many #ifdefs. >>>>>>> >>>>>>> In do_put_xxx() can you combine your changes: >>>>>>> >>>>>>> if (is_vol) { >>>>>>> // See comment in do_get_xxx(). >>>>>>> #ifndef PPC64 >>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>> #else >>>>>>> if (is_field) { >>>>>>> // Add MemBarRelease for constructors which write volatile field >>>>>>> (PPC64). >>>>>>> set_wrote_volatile(true); >>>>>>> } >>>>>>> #endif >>>>>>> } >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I preprared a webrev with fixes for PPC for the VolatileIRIWTest of >>>>>>>> the torture test suite: >>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>> >>>>>>>> Example: >>>>>>>> volatile x=0, y=0 >>>>>>>> __________ __________ __________ __________ >>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>> >>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>> read(y) read(x) >>>>>>>> >>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>> >>>>>>>> >>>>>>>> Solution: This example requires multiple-copy-atomicity. This is only >>>>>>>> assured by the sync instruction and if it is executed in the threads >>>>>>>> doing the loads. Thus we implement volatile read as sync-load-acquire >>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>> >>>>>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs >>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>> >>>>>>>> Further this change contains a fix that assures that volatile fields >>>>>>>> written in constructors are visible before the reference gets >>>>>>>> published. >>>>>>>> >>>>>>>> >>>>>>>> Looking at the code, we found a MemBarRelease that to us, seems too >>>>>>>> strong. >>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should suffice. >>>>>>>> What do you think? >>>>>>>> >>>>>>>> Please review and test this change. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Goetz. >>>>>>>> From david.holmes at oracle.com Wed Jan 15 21:30:23 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 16 Jan 2014 15:30:23 +1000 Subject: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization In-Reply-To: <52B3A3AF.9050609@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE707E1@DEWDFEMB12A.global.corp.sap> <52B3A3AF.9050609@oracle.com> Message-ID: <52D76E6F.8070504@oracle.com> Can I get some response on this please - specifically the redundancy wrt IRT_ENTRY actions. Thanks, David On 20/12/2013 11:55 AM, David Holmes wrote: > Still catching up ... > > On 11/12/2013 9:46 PM, Lindenmaier, Goetz wrote: >> Hi, >> >> this change adds StoreStore barriers after object initialization and >> after constructor calls in the C++ interpreter. This assures no >> uninitialized >> objects or final fields are visible. >> http://cr.openjdk.java.net/~goetz/webrevs/8029957-0-moci/ > > The InterpreterRuntime calls are all IRT_ENTRY points which will utilize > thread state transitions that already include a full "fence" so the > storestore barriers are redundant in those cases. > > The fastpath _new storestore seems okay. > > I don't know how handle_return gets used to know if it is reasonable or > not. > > I was trying, unsuccessfully, to examine the same code in the > templateInterpreter to see how it handles these cases as it naturally > has the same object-initialization-safety requirements (though these can > be handled in a number of different ways other than an unconditional > storestore barrier at the end of the initialization and construction > phases. > > David > ----- > >> Please review and test this change. >> >> Best regards, >> Goetz. >> From vladimir.kozlov at oracle.com Wed Jan 15 23:13:27 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 15 Jan 2014 23:13:27 -0800 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <52D76D50.60700@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293F087.2080700@oracle.com> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> Message-ID: <52D78697.2090408@oracle.com> This is becoming ugly #ifdef mess. In compiler code we are trying to avoid them. I suggested to have _wrote_volatile without #ifdef and I want to keep it this way, it could be useful to have such info on other platforms too. But I would suggest to remove PPC64 comments in parse.hpp. In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value which could be checked in all places instead of #ifdef: #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; #else const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; #endif or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever and then: #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ oop p = JNIHandles::resolve(obj); \ + if (support_IRIW_for_not_multiple_copy_atomic_cpu) OrderAccess::fence(); \ volatile type_name v = OrderAccess::load_acquire((volatile type_name*)index_oop_from_field_offset_long(p, offset)); And: + if (support_IRIW_for_not_multiple_copy_atomic_cpu && field->is_volatile()) { + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier + } And so on. The comments will be needed only in globalDefinitions.hpp The code in parse1.cpp could be put on one line: + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && method()->is_initializer()) )) { Thanks, Vladimir On 1/15/14 9:25 PM, David Holmes wrote: > On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >> Hi David, >> >> I updated the webrev: >> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >> >> - I removed the IRIW example in parse3.cpp >> - I adapted the comments not to point to that comment, and to >> reflect the new flagging. Also I mention that we support the >> volatile constructor issue, but that it's not standard. >> - I protected issuing the barrier for the constructor by PPC64. >> I also think it's better to separate these this way. > > Sorry if I wasn't clear but I'd like the wrote_volatile field declaration and all uses to be guarded by ifdef PPC64 too > please. > > One nit I missed before. In src/share/vm/opto/library_call.cpp this comment doesn't make much sense to me and refers to > ppc specific stuff in a shared file: > > if (is_volatile) { > ! if (!is_store) { > insert_mem_bar(Op_MemBarAcquire); > ! } else { > ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC > ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. > insert_mem_bar(Op_MemBarVolatile); > + #endif > + } > > I don't think the comment is needed. > > Thanks, > David > >> Thanks for your comments! >> >> Best regards, >> Goetz. >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Mittwoch, 15. Januar 2014 01:55 >> To: Lindenmaier, Goetz >> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> Hi Goetz, >> >> Sorry for the delay in getting back to this. >> >> The general changes to the volatile barriers to support IRIW are okay. >> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >> specifically it is >> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >> the commentary excessive, particularly for shared code. In particular >> the IRIW example in parse3.cpp - it seems a strange place to give the >> explanation and I don't think we need it to that level of detail. Seems >> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >> >> The changes related to volatile writes in the constructor, as discussed >> are not required by the Java Memory Model. If you want to keep these >> then I think they should all be guarded with PPC64 because it is not >> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >> PPC64 porters. >> >> Thanks, >> David >> >> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>> Hi, >>> >>> I updated this webrev. I detected a small flaw I made when editing this version. >>> The #endif in line 322, parse3.cpp was in the wrong line. >>> I also based the webrev on the latest version of the stage repo. >>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>> >>> Best regards, >>> Goetz. >>> >>> -----Original Message----- >>> From: Lindenmaier, Goetz >>> Sent: Freitag, 20. Dezember 2013 13:47 >>> To: David Holmes >>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>> >>> Hi David, >>> >>>> So we can at least undo #4 now we have established those tests were not >>>> required to pass. >>> We would prefer if we could keep this in. We want to avoid that it's >>> blamed on the VM if java programs are failing on PPC after they worked >>> on x86. To clearly mark it as overfulfilling the spec I would guard it by >>> a flag as proposed. But if you insist I will remove it. Also, this part is >>> not that performance relevant. >>> >>>> A compile-time guard (ifdef) would be better than a runtime one I think >>> I added a compile-time guard in this new webrev: >>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>> several double negations I don't like, (#ifNdef CPU_NOT_MULTIPLE_COPY_ATOMIC) >>> but this way I only have to change the ppc platform. >>> >>> Best regards, >>> Goetz >>> >>> P.S.: I will also be available over the Christmas period. >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Freitag, 20. Dezember 2013 05:58 >>> To: Lindenmaier, Goetz >>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>> >>> Sorry for the delay, it takes a while to catch up after two weeks >>> vacation :) Next vacation (ie next two weeks) I'll continue to check emails. >>> >>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>> Hi, >>>> >>>> ok, I understand the tests are wrong. It's good this issue is settled. >>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>> >>>> About our change: David, the causality is the other way round. >>>> The change is about IRIW. >>>> 1. To pass IRIW, we must use sync instructions before loads. >>> >>> This is the part I still have some question marks over as the >>> implications are not nice for performance on non-TSO platforms. But I'm >>> no further along in processing that paper I'm afraid. >>> >>>> 2. If we do syncs before loads, we don't need to do them after stores. >>>> 3. If we don't do them after stores, we fail the volatile constructor tests. >>>> 4. So finally we added them again at the end of the constructor after stores >>>> to pass the volatile constructor tests. >>> >>> So we can at least undo #4 now we have established those tests were not >>> required to pass. >>> >>>> We originally passed the constructor tests because the ppc memory order >>>> instructions are not as find-granular as the >>>> operations in the IR. MemBarVolatile is specified as StoreLoad. The only instruction >>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, therefore the >>>> MemBarVolatile after the store fixes the constructor tests. The proper representation >>>> of the fix in the IR would be adding a MemBarStoreStore. But now it's pointless >>>> anyways. >>>> >>>>> I'm not happy with the ifdef approach but I won't block it. >>>> I'd be happy to add a property >>>> OrderAccess::cpu_is_multiple_copy_atomic() >>> >>> A compile-time guard (ifdef) would be better than a runtime one I think >>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>> based not architecture based) as that will allows for turning this >>> on/off for any architecture for testing purposes. >>> >>> Thanks, >>> David >>> >>>> or the like to guard the customization. I'd like that much better. Or also >>>> OrderAccess::needs_support_iriw_ordering() >>>> VM_Version::needs_support_iriw_ordering() >>>> >>>> >>>> Best regards, >>>> Goetz. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Donnerstag, 28. November 2013 00:34 >>>> To: Lindenmaier, Goetz >>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>> >>>> TL;DR version: >>>> >>>> Discussion on the c-i list has now confirmed that a constructor-barrier >>>> for volatiles is not required as part of the JMM specification. It *may* >>>> be required in an implementation that doesn't pre-zero memory to ensure >>>> you can't see uninitialized fields. So the tests for this are invalid >>>> and this part of the patch is not needed in general (ppc64 may need it >>>> due to other factors). >>>> >>>> Re: "multiple copy atomicity" - first thanks for correcting the term :) >>>> Second thanks for the reference to that paper! For reference: >>>> >>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>> complex interconnect) does not guarantee that a write becomes visible to >>>> all other hardware threads at the same time point; these architectures >>>> are not multiple-copy atomic." >>>> >>>> This is the visibility issue that I referred to and affects both ARM and >>>> PPC. But of course it is normally handled by using suitable barriers >>>> after the stores that need to be visible. I think the crux of the >>>> current issue is what you wrote below: >>>> >>>> > The fixes for the constructor issue are only needed because we >>>> > remove the sync instruction from behind stores (parse3.cpp:320) >>>> > and place it before loads. >>>> >>>> I hadn't grasped this part. Obviously if you fail to do the sync after >>>> the store then you have to do something around the loads to get the same >>>> results! I still don't know what lead you to the conclusion that the >>>> only way to fix the IRIW issue was to put the fence before the load - >>>> maybe when I get the chance to read that paper in full it will be clearer. >>>> >>>> So ... the basic problem is that the current structure in the VM has >>>> hard-wired one choice of how to get the right semantics for volatile >>>> variables. You now want to customize that but not all the requisite >>>> hooks are present. It would be better if volatile_load and >>>> volatile_store were factored out so that they could be implemented as >>>> desired per-platform. Alternatively there could be pre- and post- hooks >>>> that could then be customized per platform. Otherwise you need >>>> platform-specific ifdef's to handle it as per your patch. >>>> >>>> I'm not happy with the ifdef approach but I won't block it. I think this >>>> is an area where a lot of clean up is needed in the VM. The barrier >>>> abstractions are a confused mess in my opinion. >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>> Hi, >>>>> >>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>> >>>>> I did not yet add the >>>>> OrderAccess::needs_support_iriw_ordering() >>>>> VM_Version::needs_support_iriw_ordering() >>>>> or >>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>> to reduce #defined, as I got no further comment on that. >>>>> >>>>> >>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>> I feel not in the position to contribute substantially. >>>>> >>>>> But we would like to pass the torture test suite as we consider >>>>> this a substantial task in implementing a PPC port. Also we think >>>>> both tests show behavior a programmer would expect. It's bad if >>>>> Java code runs fine on the more common x86 platform, and then >>>>> fails on ppc. This will always first be blamed on the VM. >>>>> >>>>> The fixes for the constructor issue are only needed because we >>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>> and place it before loads. Then there is no sync between volatile store >>>>> and publishing the object. So we add it again in this one case >>>>> (volatile store in constructor). >>>>> >>>>> >>>>> @David >>>>>>> Sure. There also is no solution as you require for the taskqueue problem yet, >>>>>>> and that's being discussed now for almost a year. >>>>>> It may have started a year ago but work on it has hardly been continuous. >>>>> That's not true, we did a lot of investigation and testing on this issue. >>>>> And we came up with a solution we consider the best possible. If you >>>>> have objections, you should at least give the draft of a better solution, >>>>> we would volunteer to implement and test it. >>>>> Similarly, we invested time in fixing the concurrency torture issues. >>>>> >>>>> @David >>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term and >>>>>> can't find any reference to it. >>>>> We learned about this reading "A Tutorial Introduction to the ARM and >>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>> Peter Sewell, which is cited in "Correct and Efficient Work-Stealing for >>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the taskqueue problem. >>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>> >>>>> I was wrong in one thing, it's called multiple copy atomicity, I used 'read' >>>>> instead. Sorry for that. (I also fixed that in the method name above). >>>>> >>>>> Best regards and thanks for all your involvements, >>>>> Goetz. >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>> To: Lindenmaier, Goetz >>>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>>> >>>>> Hi Goetz, >>>>> >>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>> Hi David, >>>>>> >>>>>> -- Volatile in constuctor >>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>> missing constructor barrier. >>>>>> We see them on PPC64. Our test machines have typically 8-32 processors >>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks Aleksey!) >>>>> >>>>> And see follow ups - the tests are invalid. >>>>> >>>>>> -- IRIW issue >>>>>>> I can not possibly answer to the necessary level of detail with a few >>>>>>> moments thought. >>>>>> Sure. There also is no solution as you require for the taskqueue problem yet, >>>>>> and that's being discussed now for almost a year. >>>>> >>>>> It may have started a year ago but work on it has hardly been continuous. >>>>> >>>>>>> You are implying there is a problem here that will >>>>>>> impact numerous platforms (unless you can tell me why ppc is so different?) >>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore I contributed a >>>>>> solution with the #defines, and that's correct for all, but not nice, I admit. >>>>>> (I don't really know about ARM, though). >>>>>> So if I can write down a nicer solution testing for methods that are evaluated >>>>>> by the C-compiler I'm happy. >>>>>> >>>>>> The problem is not that IRIW is not handled by the JMM, the problem >>>>>> is that >>>>>> store >>>>>> sync >>>>>> does not assure multiple-read-atomicity, >>>>>> only >>>>>> sync >>>>>> load >>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>> pass that test. >>>>> >>>>> What is "multiple-read-atomicity"? I'm not familiar with the term and >>>>> can't find any reference to it. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> The JMM is fine. And >>>>>> store >>>>>> MemBarVolatile >>>>>> is fine on x86, sparc etc. as there exist assembler instructions that >>>>>> do what is required. >>>>>> >>>>>> So if you are off soon, please let's come to a solution that >>>>>> might be improvable in the way it's implemented, but that >>>>>> allows us to implement a correct PPC64 port. >>>>>> >>>>>> Best regards, >>>>>> Goetz. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>> To: Lindenmaier, Goetz >>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>>>> >>>>>> Hi Goetz, >>>>>> >>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>> Hi everybody, >>>>>>> >>>>>>> thanks a lot for the detailed reviews! >>>>>>> I'll try to answer to all in one mail. >>>>>>> >>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM to occur before the reference is assigned; >>>>>>> We don't think it's correct if we omit the barrier after initializing >>>>>>> a volatile field. Previously, we discussed this with Aleksey Shipilev >>>>>>> and Doug Lea, and they agreed. >>>>>>> Also, concurrency torture tests >>>>>>> LongVolatileTest >>>>>>> AtomicIntegerInitialValueTest >>>>>>> will fail. >>>>>>> (In addition, observing 0 instead of the inital value of a volatile field would be >>>>>>> very counter-intuitive for Java programmers, especially in AtomicInteger.) >>>>>> >>>>>> The affects of unsafe publication are always surprising - volatiles do >>>>>> not add anything special here. AFAIK there is nothing in the JMM that >>>>>> requires the constructor barrier - discussions with Doug and Aleksey >>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>> missing constructor barrier. >>>>>> >>>>>>>> proposed for PPC64 is to make volatile reads extremely heavyweight >>>>>>> Yes, it costs measurable performance. But else it is wrong. We don't >>>>>>> see a way to implement this cheaper. >>>>>>> >>>>>>>> - these algorithms should be expressed using the correct OrderAccess operations >>>>>>> Basically, I agree on this. But you also have to take into account >>>>>>> that due to the different memory ordering instructions on different platforms >>>>>>> just implementing something empty is not sufficient. >>>>>>> An example: >>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>> If these are consecutively in the code, sparc code looks like this: >>>>>>> MemBarRelease --> membar(Assembler::LoadStore | Assembler::StoreStore) >>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>> Just doing what is required. >>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>> fine grained operations: >>>>>>> MemBarRelease --> lwsync // Doing LoadStore, StoreStore, LoadLoad >>>>>>> MemBarVolatile --> sync // // Doing LoadStore, StoreStore, LoadLoad, StoreLoad >>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations are more (too) powerful, >>>>>>> I need an additional optimization that removes the lwsync. I can not implement >>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>> >>>>>>> Back to the IRIW problem. I think here we have a comparable issue. >>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the read >>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>> >>>>>>> I would propose to guard the code by >>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>> Else, David, how would you propose to implement this platform independent? >>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>> >>>>>> I can not possibly answer to the necessary level of detail with a few >>>>>> moments thought. You are implying there is a problem here that will >>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>> different?) and I can not take that on face value at the moment. The >>>>>> only reason I can see IRIW not being handled by the JMM requirements for >>>>>> volatile accesses is if there are global visibility issues that are not >>>>>> addressed - but even then I would expect heavy barriers at the store >>>>>> would deal with that, not at the load. (This situation reminds me of the >>>>>> need for read-barriers on Alpha architecture due to the use of software >>>>>> cache-coherency rather than hardware cache-coherency - but we don't have >>>>>> that on ppc!) >>>>>> >>>>>> Sorry - There is no quick resolution here and in a couple of days I will >>>>>> be heading out on vacation for two weeks. >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Best regards, >>>>>>> Goetz. >>>>>>> >>>>>>> -- Other ports: >>>>>>> The IRIW issue requires at least 3 processors to be relevant, so it might >>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of them). >>>>>>> >>>>>>> -- MemBarStoreStore after initialization >>>>>>> I agree we should not change it in the ppc port. If you wish, I can >>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>> To: Vladimir Kozlov >>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>>>>> >>>>>>> Okay this is my second attempt at answering this in a reasonable way :) >>>>>>> >>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>> I have to ask David to do correctness evaluation. >>>>>>> >>>>>>> From what I understand what we see here is an attempt to fix an >>>>>>> existing issue with the implementation of volatiles so that the IRIW >>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>> volatile reads extremely heavyweight by adding a fence() when doing the >>>>>>> load. >>>>>>> >>>>>>> Now if this was purely handled in ppc64 source code then I would be >>>>>>> happy to let them do whatever they like (surely this kills performance >>>>>>> though!). But I do not agree with the changes to the shared code that >>>>>>> allow this solution to be implemented - even with PPC64_ONLY this is >>>>>>> polluting the shared code. My concern is similar to what I said with the >>>>>>> taskQueue changes - these algorithms should be expressed using the >>>>>>> correct OrderAccess operations to guarantee the desired properties >>>>>>> independent of architecture. If such a "barrier" is not needed on a >>>>>>> given architecture then the implementation in OrderAccess should reduce >>>>>>> to a no-op. >>>>>>> >>>>>>> And as Vitaly points out the constructor barriers are not needed under >>>>>>> the JMM. >>>>>>> >>>>>>>> I am fine with suggested changes because you did not change our current >>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>> But may be it should be done using more general query which is set >>>>>>>> depending on platform: >>>>>>>> >>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>> >>>>>>>> or similar to what we use now: >>>>>>>> >>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>> >>>>>>> Every platform has to support IRIW this is simply part of the Java >>>>>>> Memory Model, there should not be any need to call this out explicitly >>>>>>> like this. >>>>>>> >>>>>>> Is there some subtlety of the hardware I am missing here? Are there >>>>>>> visibility issues beyond the ordering constraints that the JMM defines? >>>>>>>> From what I understand our ppc port is also affected. David? >>>>>>> >>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> In library_call.cpp can you add {}? New comment should be inside else {}. >>>>>>>> >>>>>>>> I think you should make _wrote_volatile field not ppc64 specific which >>>>>>>> will be set to 'true' only on ppc64. Then you will not need PPC64_ONLY() >>>>>>>> except in do_put_xxx() where it is set to true. Too many #ifdefs. >>>>>>>> >>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>> >>>>>>>> if (is_vol) { >>>>>>>> // See comment in do_get_xxx(). >>>>>>>> #ifndef PPC64 >>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>> #else >>>>>>>> if (is_field) { >>>>>>>> // Add MemBarRelease for constructors which write volatile field >>>>>>>> (PPC64). >>>>>>>> set_wrote_volatile(true); >>>>>>>> } >>>>>>>> #endif >>>>>>>> } >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I preprared a webrev with fixes for PPC for the VolatileIRIWTest of >>>>>>>>> the torture test suite: >>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>> >>>>>>>>> Example: >>>>>>>>> volatile x=0, y=0 >>>>>>>>> __________ __________ __________ __________ >>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>> >>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>> read(y) read(x) >>>>>>>>> >>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>> >>>>>>>>> >>>>>>>>> Solution: This example requires multiple-copy-atomicity. This is only >>>>>>>>> assured by the sync instruction and if it is executed in the threads >>>>>>>>> doing the loads. Thus we implement volatile read as sync-load-acquire >>>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>>> >>>>>>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs >>>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>>> >>>>>>>>> Further this change contains a fix that assures that volatile fields >>>>>>>>> written in constructors are visible before the reference gets >>>>>>>>> published. >>>>>>>>> >>>>>>>>> >>>>>>>>> Looking at the code, we found a MemBarRelease that to us, seems too >>>>>>>>> strong. >>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should suffice. >>>>>>>>> What do you think? >>>>>>>>> >>>>>>>>> Please review and test this change. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Goetz. >>>>>>>>> From volker.simonis at gmail.com Thu Jan 16 00:08:35 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Thu, 16 Jan 2014 09:08:35 +0100 Subject: RFR(S): JDK-8031134 : PPC64: implement printing on AIX Message-ID: Resending one more time to 2d-dev upon request: Hi, could somebody please review the following small change: http://cr.openjdk.java.net/~simonis/webrevs/8031134/ It's the straight forward implementation of the basic printing infrastructure on AIX and shouldn't have any impact on the existing platforms. As always, this change is intended for the http://hg.openjdk.java.net/ppc-aix-port/stage/jdk repository. Thank you and best regards, Volker From david.holmes at oracle.com Thu Jan 16 00:34:10 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 16 Jan 2014 18:34:10 +1000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <52D78697.2090408@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293F087.2080700@oracle.com> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> <52D78697.2090408@oracle.com> Message-ID: <52D79982.4060100@oracle.com> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote: > This is becoming ugly #ifdef mess. In compiler code we are trying to > avoid them. I suggested to have _wrote_volatile without #ifdef and I > want to keep it this way, it could be useful to have such info on other > platforms too. But I would suggest to remove PPC64 comments in parse.hpp. > > In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value > which could be checked in all places instead of #ifdef: I asked for the ifdef some time back as I find it much preferable to have this as a build-time construct rather than a runtime one. I don't want to have to pay anything for this if we don't use it. David > #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC > const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; > #else > const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; > #endif > > or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever > > and then: > > #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ > oop p = JNIHandles::resolve(obj); \ > + if (support_IRIW_for_not_multiple_copy_atomic_cpu) > OrderAccess::fence(); \ > volatile type_name v = OrderAccess::load_acquire((volatile > type_name*)index_oop_from_field_offset_long(p, offset)); > > And: > > + if (support_IRIW_for_not_multiple_copy_atomic_cpu && > field->is_volatile()) { > + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier > + } > > And so on. The comments will be needed only in globalDefinitions.hpp > > The code in parse1.cpp could be put on one line: > > + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && > method()->is_initializer()) )) { > > Thanks, > Vladimir > > On 1/15/14 9:25 PM, David Holmes wrote: >> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >>> Hi David, >>> >>> I updated the webrev: >>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>> >>> - I removed the IRIW example in parse3.cpp >>> - I adapted the comments not to point to that comment, and to >>> reflect the new flagging. Also I mention that we support the >>> volatile constructor issue, but that it's not standard. >>> - I protected issuing the barrier for the constructor by PPC64. >>> I also think it's better to separate these this way. >> >> Sorry if I wasn't clear but I'd like the wrote_volatile field >> declaration and all uses to be guarded by ifdef PPC64 too >> please. >> >> One nit I missed before. In src/share/vm/opto/library_call.cpp this >> comment doesn't make much sense to me and refers to >> ppc specific stuff in a shared file: >> >> if (is_volatile) { >> ! if (!is_store) { >> insert_mem_bar(Op_MemBarAcquire); >> ! } else { >> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC >> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. >> insert_mem_bar(Op_MemBarVolatile); >> + #endif >> + } >> >> I don't think the comment is needed. >> >> Thanks, >> David >> >>> Thanks for your comments! >>> >>> Best regards, >>> Goetz. >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Mittwoch, 15. Januar 2014 01:55 >>> To: Lindenmaier, Goetz >>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>> Independent Reads of Independent Writes >>> >>> Hi Goetz, >>> >>> Sorry for the delay in getting back to this. >>> >>> The general changes to the volatile barriers to support IRIW are okay. >>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >>> specifically it is >>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >>> the commentary excessive, particularly for shared code. In particular >>> the IRIW example in parse3.cpp - it seems a strange place to give the >>> explanation and I don't think we need it to that level of detail. Seems >>> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >>> >>> The changes related to volatile writes in the constructor, as discussed >>> are not required by the Java Memory Model. If you want to keep these >>> then I think they should all be guarded with PPC64 because it is not >>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >>> PPC64 porters. >>> >>> Thanks, >>> David >>> >>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>>> Hi, >>>> >>>> I updated this webrev. I detected a small flaw I made when editing >>>> this version. >>>> The #endif in line 322, parse3.cpp was in the wrong line. >>>> I also based the webrev on the latest version of the stage repo. >>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>> >>>> Best regards, >>>> Goetz. >>>> >>>> -----Original Message----- >>>> From: Lindenmaier, Goetz >>>> Sent: Freitag, 20. Dezember 2013 13:47 >>>> To: David Holmes >>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of >>>> Independent Reads of Independent Writes >>>> >>>> Hi David, >>>> >>>>> So we can at least undo #4 now we have established those tests were >>>>> not >>>>> required to pass. >>>> We would prefer if we could keep this in. We want to avoid that it's >>>> blamed on the VM if java programs are failing on PPC after they worked >>>> on x86. To clearly mark it as overfulfilling the spec I would guard >>>> it by >>>> a flag as proposed. But if you insist I will remove it. Also, this >>>> part is >>>> not that performance relevant. >>>> >>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>> think >>>> I added a compile-time guard in this new webrev: >>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>>> several double negations I don't like, (#ifNdef >>>> CPU_NOT_MULTIPLE_COPY_ATOMIC) >>>> but this way I only have to change the ppc platform. >>>> >>>> Best regards, >>>> Goetz >>>> >>>> P.S.: I will also be available over the Christmas period. >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Freitag, 20. Dezember 2013 05:58 >>>> To: Lindenmaier, Goetz >>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>> Independent Reads of Independent Writes >>>> >>>> Sorry for the delay, it takes a while to catch up after two weeks >>>> vacation :) Next vacation (ie next two weeks) I'll continue to check >>>> emails. >>>> >>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>>> Hi, >>>>> >>>>> ok, I understand the tests are wrong. It's good this issue is >>>>> settled. >>>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>>> >>>>> About our change: David, the causality is the other way round. >>>>> The change is about IRIW. >>>>> 1. To pass IRIW, we must use sync instructions before loads. >>>> >>>> This is the part I still have some question marks over as the >>>> implications are not nice for performance on non-TSO platforms. But I'm >>>> no further along in processing that paper I'm afraid. >>>> >>>>> 2. If we do syncs before loads, we don't need to do them after stores. >>>>> 3. If we don't do them after stores, we fail the volatile >>>>> constructor tests. >>>>> 4. So finally we added them again at the end of the constructor >>>>> after stores >>>>> to pass the volatile constructor tests. >>>> >>>> So we can at least undo #4 now we have established those tests were not >>>> required to pass. >>>> >>>>> We originally passed the constructor tests because the ppc memory >>>>> order >>>>> instructions are not as find-granular as the >>>>> operations in the IR. MemBarVolatile is specified as StoreLoad. >>>>> The only instruction >>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, >>>>> therefore the >>>>> MemBarVolatile after the store fixes the constructor tests. The >>>>> proper representation >>>>> of the fix in the IR would be adding a MemBarStoreStore. But now >>>>> it's pointless >>>>> anyways. >>>>> >>>>>> I'm not happy with the ifdef approach but I won't block it. >>>>> I'd be happy to add a property >>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>> >>>> A compile-time guard (ifdef) would be better than a runtime one I think >>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>>> based not architecture based) as that will allows for turning this >>>> on/off for any architecture for testing purposes. >>>> >>>> Thanks, >>>> David >>>> >>>>> or the like to guard the customization. I'd like that much better. >>>>> Or also >>>>> OrderAccess::needs_support_iriw_ordering() >>>>> VM_Version::needs_support_iriw_ordering() >>>>> >>>>> >>>>> Best regards, >>>>> Goetz. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>> Sent: Donnerstag, 28. November 2013 00:34 >>>>> To: Lindenmaier, Goetz >>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>> Independent Reads of Independent Writes >>>>> >>>>> TL;DR version: >>>>> >>>>> Discussion on the c-i list has now confirmed that a >>>>> constructor-barrier >>>>> for volatiles is not required as part of the JMM specification. It >>>>> *may* >>>>> be required in an implementation that doesn't pre-zero memory to >>>>> ensure >>>>> you can't see uninitialized fields. So the tests for this are invalid >>>>> and this part of the patch is not needed in general (ppc64 may need it >>>>> due to other factors). >>>>> >>>>> Re: "multiple copy atomicity" - first thanks for correcting the >>>>> term :) >>>>> Second thanks for the reference to that paper! For reference: >>>>> >>>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>>> complex interconnect) does not guarantee that a write becomes >>>>> visible to >>>>> all other hardware threads at the same time point; these architectures >>>>> are not multiple-copy atomic." >>>>> >>>>> This is the visibility issue that I referred to and affects both >>>>> ARM and >>>>> PPC. But of course it is normally handled by using suitable barriers >>>>> after the stores that need to be visible. I think the crux of the >>>>> current issue is what you wrote below: >>>>> >>>>> > The fixes for the constructor issue are only needed because we >>>>> > remove the sync instruction from behind stores (parse3.cpp:320) >>>>> > and place it before loads. >>>>> >>>>> I hadn't grasped this part. Obviously if you fail to do the sync after >>>>> the store then you have to do something around the loads to get the >>>>> same >>>>> results! I still don't know what lead you to the conclusion that the >>>>> only way to fix the IRIW issue was to put the fence before the load - >>>>> maybe when I get the chance to read that paper in full it will be >>>>> clearer. >>>>> >>>>> So ... the basic problem is that the current structure in the VM has >>>>> hard-wired one choice of how to get the right semantics for volatile >>>>> variables. You now want to customize that but not all the requisite >>>>> hooks are present. It would be better if volatile_load and >>>>> volatile_store were factored out so that they could be implemented as >>>>> desired per-platform. Alternatively there could be pre- and post- >>>>> hooks >>>>> that could then be customized per platform. Otherwise you need >>>>> platform-specific ifdef's to handle it as per your patch. >>>>> >>>>> I'm not happy with the ifdef approach but I won't block it. I think >>>>> this >>>>> is an area where a lot of clean up is needed in the VM. The barrier >>>>> abstractions are a confused mess in my opinion. >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>>> Hi, >>>>>> >>>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>> >>>>>> I did not yet add the >>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>> VM_Version::needs_support_iriw_ordering() >>>>>> or >>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>> to reduce #defined, as I got no further comment on that. >>>>>> >>>>>> >>>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>>> I feel not in the position to contribute substantially. >>>>>> >>>>>> But we would like to pass the torture test suite as we consider >>>>>> this a substantial task in implementing a PPC port. Also we think >>>>>> both tests show behavior a programmer would expect. It's bad if >>>>>> Java code runs fine on the more common x86 platform, and then >>>>>> fails on ppc. This will always first be blamed on the VM. >>>>>> >>>>>> The fixes for the constructor issue are only needed because we >>>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>>> and place it before loads. Then there is no sync between volatile >>>>>> store >>>>>> and publishing the object. So we add it again in this one case >>>>>> (volatile store in constructor). >>>>>> >>>>>> >>>>>> @David >>>>>>>> Sure. There also is no solution as you require for the >>>>>>>> taskqueue problem yet, >>>>>>>> and that's being discussed now for almost a year. >>>>>>> It may have started a year ago but work on it has hardly been >>>>>>> continuous. >>>>>> That's not true, we did a lot of investigation and testing on this >>>>>> issue. >>>>>> And we came up with a solution we consider the best possible. If you >>>>>> have objections, you should at least give the draft of a better >>>>>> solution, >>>>>> we would volunteer to implement and test it. >>>>>> Similarly, we invested time in fixing the concurrency torture issues. >>>>>> >>>>>> @David >>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term >>>>>>> and >>>>>>> can't find any reference to it. >>>>>> We learned about this reading "A Tutorial Introduction to the ARM and >>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>>> Peter Sewell, which is cited in "Correct and Efficient >>>>>> Work-Stealing for >>>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the >>>>>> taskqueue problem. >>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>> >>>>>> I was wrong in one thing, it's called multiple copy atomicity, I >>>>>> used 'read' >>>>>> instead. Sorry for that. (I also fixed that in the method name >>>>>> above). >>>>>> >>>>>> Best regards and thanks for all your involvements, >>>>>> Goetz. >>>>>> >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>>> To: Lindenmaier, Goetz >>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>> Independent Reads of Independent Writes >>>>>> >>>>>> Hi Goetz, >>>>>> >>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>>> Hi David, >>>>>>> >>>>>>> -- Volatile in constuctor >>>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>>> missing constructor barrier. >>>>>>> We see them on PPC64. Our test machines have typically 8-32 >>>>>>> processors >>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks Aleksey!) >>>>>> >>>>>> And see follow ups - the tests are invalid. >>>>>> >>>>>>> -- IRIW issue >>>>>>>> I can not possibly answer to the necessary level of detail with >>>>>>>> a few >>>>>>>> moments thought. >>>>>>> Sure. There also is no solution as you require for the taskqueue >>>>>>> problem yet, >>>>>>> and that's being discussed now for almost a year. >>>>>> >>>>>> It may have started a year ago but work on it has hardly been >>>>>> continuous. >>>>>> >>>>>>>> You are implying there is a problem here that will >>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>> different?) >>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore >>>>>>> I contributed a >>>>>>> solution with the #defines, and that's correct for all, but not >>>>>>> nice, I admit. >>>>>>> (I don't really know about ARM, though). >>>>>>> So if I can write down a nicer solution testing for methods that >>>>>>> are evaluated >>>>>>> by the C-compiler I'm happy. >>>>>>> >>>>>>> The problem is not that IRIW is not handled by the JMM, the problem >>>>>>> is that >>>>>>> store >>>>>>> sync >>>>>>> does not assure multiple-read-atomicity, >>>>>>> only >>>>>>> sync >>>>>>> load >>>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>>> pass that test. >>>>>> >>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term and >>>>>> can't find any reference to it. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> The JMM is fine. And >>>>>>> store >>>>>>> MemBarVolatile >>>>>>> is fine on x86, sparc etc. as there exist assembler instructions >>>>>>> that >>>>>>> do what is required. >>>>>>> >>>>>>> So if you are off soon, please let's come to a solution that >>>>>>> might be improvable in the way it's implemented, but that >>>>>>> allows us to implement a correct PPC64 port. >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>>> To: Lindenmaier, Goetz >>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; >>>>>>> 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>> Independent Reads of Independent Writes >>>>>>> >>>>>>> Hi Goetz, >>>>>>> >>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>>> Hi everybody, >>>>>>>> >>>>>>>> thanks a lot for the detailed reviews! >>>>>>>> I'll try to answer to all in one mail. >>>>>>>> >>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM >>>>>>>>> to occur before the reference is assigned; >>>>>>>> We don't think it's correct if we omit the barrier after >>>>>>>> initializing >>>>>>>> a volatile field. Previously, we discussed this with Aleksey >>>>>>>> Shipilev >>>>>>>> and Doug Lea, and they agreed. >>>>>>>> Also, concurrency torture tests >>>>>>>> LongVolatileTest >>>>>>>> AtomicIntegerInitialValueTest >>>>>>>> will fail. >>>>>>>> (In addition, observing 0 instead of the inital value of a >>>>>>>> volatile field would be >>>>>>>> very counter-intuitive for Java programmers, especially in >>>>>>>> AtomicInteger.) >>>>>>> >>>>>>> The affects of unsafe publication are always surprising - >>>>>>> volatiles do >>>>>>> not add anything special here. AFAIK there is nothing in the JMM >>>>>>> that >>>>>>> requires the constructor barrier - discussions with Doug and Aleksey >>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>>> missing constructor barrier. >>>>>>> >>>>>>>>> proposed for PPC64 is to make volatile reads extremely heavyweight >>>>>>>> Yes, it costs measurable performance. But else it is wrong. We >>>>>>>> don't >>>>>>>> see a way to implement this cheaper. >>>>>>>> >>>>>>>>> - these algorithms should be expressed using the correct >>>>>>>>> OrderAccess operations >>>>>>>> Basically, I agree on this. But you also have to take into account >>>>>>>> that due to the different memory ordering instructions on >>>>>>>> different platforms >>>>>>>> just implementing something empty is not sufficient. >>>>>>>> An example: >>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>>> If these are consecutively in the code, sparc code looks like this: >>>>>>>> MemBarRelease --> membar(Assembler::LoadStore | >>>>>>>> Assembler::StoreStore) >>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>>> Just doing what is required. >>>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>>> fine grained operations: >>>>>>>> MemBarRelease --> lwsync // Doing LoadStore, >>>>>>>> StoreStore, LoadLoad >>>>>>>> MemBarVolatile --> sync // // Doing LoadStore, >>>>>>>> StoreStore, LoadLoad, StoreLoad >>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations >>>>>>>> are more (too) powerful, >>>>>>>> I need an additional optimization that removes the lwsync. I >>>>>>>> can not implement >>>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>>> >>>>>>>> Back to the IRIW problem. I think here we have a comparable issue. >>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the >>>>>>>> read >>>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>>> >>>>>>>> I would propose to guard the code by >>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>>> Else, David, how would you propose to implement this platform >>>>>>>> independent? >>>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>>> >>>>>>> I can not possibly answer to the necessary level of detail with a >>>>>>> few >>>>>>> moments thought. You are implying there is a problem here that will >>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>> different?) and I can not take that on face value at the moment. The >>>>>>> only reason I can see IRIW not being handled by the JMM >>>>>>> requirements for >>>>>>> volatile accesses is if there are global visibility issues that >>>>>>> are not >>>>>>> addressed - but even then I would expect heavy barriers at the store >>>>>>> would deal with that, not at the load. (This situation reminds me >>>>>>> of the >>>>>>> need for read-barriers on Alpha architecture due to the use of >>>>>>> software >>>>>>> cache-coherency rather than hardware cache-coherency - but we >>>>>>> don't have >>>>>>> that on ppc!) >>>>>>> >>>>>>> Sorry - There is no quick resolution here and in a couple of days >>>>>>> I will >>>>>>> be heading out on vacation for two weeks. >>>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> Best regards, >>>>>>>> Goetz. >>>>>>>> >>>>>>>> -- Other ports: >>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so >>>>>>>> it might >>>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of them). >>>>>>>> >>>>>>>> -- MemBarStoreStore after initialization >>>>>>>> I agree we should not change it in the ppc port. If you wish, I >>>>>>>> can >>>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>>> To: Vladimir Kozlov >>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; >>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> Okay this is my second attempt at answering this in a reasonable >>>>>>>> way :) >>>>>>>> >>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>>> I have to ask David to do correctness evaluation. >>>>>>>> >>>>>>>> From what I understand what we see here is an attempt to >>>>>>>> fix an >>>>>>>> existing issue with the implementation of volatiles so that the >>>>>>>> IRIW >>>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>>> volatile reads extremely heavyweight by adding a fence() when >>>>>>>> doing the >>>>>>>> load. >>>>>>>> >>>>>>>> Now if this was purely handled in ppc64 source code then I would be >>>>>>>> happy to let them do whatever they like (surely this kills >>>>>>>> performance >>>>>>>> though!). But I do not agree with the changes to the shared code >>>>>>>> that >>>>>>>> allow this solution to be implemented - even with PPC64_ONLY >>>>>>>> this is >>>>>>>> polluting the shared code. My concern is similar to what I said >>>>>>>> with the >>>>>>>> taskQueue changes - these algorithms should be expressed using the >>>>>>>> correct OrderAccess operations to guarantee the desired properties >>>>>>>> independent of architecture. If such a "barrier" is not needed on a >>>>>>>> given architecture then the implementation in OrderAccess should >>>>>>>> reduce >>>>>>>> to a no-op. >>>>>>>> >>>>>>>> And as Vitaly points out the constructor barriers are not needed >>>>>>>> under >>>>>>>> the JMM. >>>>>>>> >>>>>>>>> I am fine with suggested changes because you did not change our >>>>>>>>> current >>>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>>> But may be it should be done using more general query which is set >>>>>>>>> depending on platform: >>>>>>>>> >>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>> >>>>>>>>> or similar to what we use now: >>>>>>>>> >>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>> >>>>>>>> Every platform has to support IRIW this is simply part of the Java >>>>>>>> Memory Model, there should not be any need to call this out >>>>>>>> explicitly >>>>>>>> like this. >>>>>>>> >>>>>>>> Is there some subtlety of the hardware I am missing here? Are there >>>>>>>> visibility issues beyond the ordering constraints that the JMM >>>>>>>> defines? >>>>>>>>> From what I understand our ppc port is also affected. >>>>>>>>> David? >>>>>>>> >>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>>> >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> In library_call.cpp can you add {}? New comment should be >>>>>>>>> inside else {}. >>>>>>>>> >>>>>>>>> I think you should make _wrote_volatile field not ppc64 >>>>>>>>> specific which >>>>>>>>> will be set to 'true' only on ppc64. Then you will not need >>>>>>>>> PPC64_ONLY() >>>>>>>>> except in do_put_xxx() where it is set to true. Too many #ifdefs. >>>>>>>>> >>>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>>> >>>>>>>>> if (is_vol) { >>>>>>>>> // See comment in do_get_xxx(). >>>>>>>>> #ifndef PPC64 >>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>>> #else >>>>>>>>> if (is_field) { >>>>>>>>> // Add MemBarRelease for constructors which write >>>>>>>>> volatile field >>>>>>>>> (PPC64). >>>>>>>>> set_wrote_volatile(true); >>>>>>>>> } >>>>>>>>> #endif >>>>>>>>> } >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I preprared a webrev with fixes for PPC for the >>>>>>>>>> VolatileIRIWTest of >>>>>>>>>> the torture test suite: >>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>> >>>>>>>>>> Example: >>>>>>>>>> volatile x=0, y=0 >>>>>>>>>> __________ __________ __________ __________ >>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>>> >>>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>>> read(y) read(x) >>>>>>>>>> >>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This >>>>>>>>>> is only >>>>>>>>>> assured by the sync instruction and if it is executed in the >>>>>>>>>> threads >>>>>>>>>> doing the loads. Thus we implement volatile read as >>>>>>>>>> sync-load-acquire >>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>>>> >>>>>>>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs >>>>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>>>> >>>>>>>>>> Further this change contains a fix that assures that volatile >>>>>>>>>> fields >>>>>>>>>> written in constructors are visible before the reference gets >>>>>>>>>> published. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Looking at the code, we found a MemBarRelease that to us, >>>>>>>>>> seems too >>>>>>>>>> strong. >>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should >>>>>>>>>> suffice. >>>>>>>>>> What do you think? >>>>>>>>>> >>>>>>>>>> Please review and test this change. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Goetz. >>>>>>>>>> From vladimir.kozlov at oracle.com Thu Jan 16 00:54:57 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 16 Jan 2014 00:54:57 -0800 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <52D79982.4060100@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293F087.2080700@oracle.com> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> <52D78697.2090408@oracle.com> <52D79982.4060100@oracle.com> Message-ID: <52D79E61.1060801@oracle.com> On 1/16/14 12:34 AM, David Holmes wrote: > On 16/01/2014 5:13 PM, Vladimir Kozlov wrote: >> This is becoming ugly #ifdef mess. In compiler code we are trying to >> avoid them. I suggested to have _wrote_volatile without #ifdef and I >> want to keep it this way, it could be useful to have such info on other >> platforms too. But I would suggest to remove PPC64 comments in parse.hpp. >> >> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value >> which could be checked in all places instead of #ifdef: > > I asked for the ifdef some time back as I find it much preferable to have this as a build-time construct rather than a > runtime one. I don't want to have to pay anything for this if we don't use it. Any decent C++ compiler will optimize expressions with such constants defined in header files. I insist to avoid #ifdefs in C2 code. I really don't like the code with #ifdef in unsafe.cpp but I can live with it. Vladimir > > David > >> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC >> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; >> #else >> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; >> #endif >> >> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever >> >> and then: >> >> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ >> oop p = JNIHandles::resolve(obj); \ >> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) >> OrderAccess::fence(); \ >> volatile type_name v = OrderAccess::load_acquire((volatile >> type_name*)index_oop_from_field_offset_long(p, offset)); >> >> And: >> >> + if (support_IRIW_for_not_multiple_copy_atomic_cpu && >> field->is_volatile()) { >> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier >> + } >> >> And so on. The comments will be needed only in globalDefinitions.hpp >> >> The code in parse1.cpp could be put on one line: >> >> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && >> method()->is_initializer()) )) { >> >> Thanks, >> Vladimir >> >> On 1/15/14 9:25 PM, David Holmes wrote: >>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >>>> Hi David, >>>> >>>> I updated the webrev: >>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>> >>>> - I removed the IRIW example in parse3.cpp >>>> - I adapted the comments not to point to that comment, and to >>>> reflect the new flagging. Also I mention that we support the >>>> volatile constructor issue, but that it's not standard. >>>> - I protected issuing the barrier for the constructor by PPC64. >>>> I also think it's better to separate these this way. >>> >>> Sorry if I wasn't clear but I'd like the wrote_volatile field >>> declaration and all uses to be guarded by ifdef PPC64 too >>> please. >>> >>> One nit I missed before. In src/share/vm/opto/library_call.cpp this >>> comment doesn't make much sense to me and refers to >>> ppc specific stuff in a shared file: >>> >>> if (is_volatile) { >>> ! if (!is_store) { >>> insert_mem_bar(Op_MemBarAcquire); >>> ! } else { >>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC >>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. >>> insert_mem_bar(Op_MemBarVolatile); >>> + #endif >>> + } >>> >>> I don't think the comment is needed. >>> >>> Thanks, >>> David >>> >>>> Thanks for your comments! >>>> >>>> Best regards, >>>> Goetz. >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Mittwoch, 15. Januar 2014 01:55 >>>> To: Lindenmaier, Goetz >>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>> Independent Reads of Independent Writes >>>> >>>> Hi Goetz, >>>> >>>> Sorry for the delay in getting back to this. >>>> >>>> The general changes to the volatile barriers to support IRIW are okay. >>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >>>> specifically it is >>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >>>> the commentary excessive, particularly for shared code. In particular >>>> the IRIW example in parse3.cpp - it seems a strange place to give the >>>> explanation and I don't think we need it to that level of detail. Seems >>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >>>> >>>> The changes related to volatile writes in the constructor, as discussed >>>> are not required by the Java Memory Model. If you want to keep these >>>> then I think they should all be guarded with PPC64 because it is not >>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >>>> PPC64 porters. >>>> >>>> Thanks, >>>> David >>>> >>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>>>> Hi, >>>>> >>>>> I updated this webrev. I detected a small flaw I made when editing >>>>> this version. >>>>> The #endif in line 322, parse3.cpp was in the wrong line. >>>>> I also based the webrev on the latest version of the stage repo. >>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>> >>>>> Best regards, >>>>> Goetz. >>>>> >>>>> -----Original Message----- >>>>> From: Lindenmaier, Goetz >>>>> Sent: Freitag, 20. Dezember 2013 13:47 >>>>> To: David Holmes >>>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>> Independent Reads of Independent Writes >>>>> >>>>> Hi David, >>>>> >>>>>> So we can at least undo #4 now we have established those tests were >>>>>> not >>>>>> required to pass. >>>>> We would prefer if we could keep this in. We want to avoid that it's >>>>> blamed on the VM if java programs are failing on PPC after they worked >>>>> on x86. To clearly mark it as overfulfilling the spec I would guard >>>>> it by >>>>> a flag as proposed. But if you insist I will remove it. Also, this >>>>> part is >>>>> not that performance relevant. >>>>> >>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>> think >>>>> I added a compile-time guard in this new webrev: >>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>>>> several double negations I don't like, (#ifNdef >>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC) >>>>> but this way I only have to change the ppc platform. >>>>> >>>>> Best regards, >>>>> Goetz >>>>> >>>>> P.S.: I will also be available over the Christmas period. >>>>> >>>>> -----Original Message----- >>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>> Sent: Freitag, 20. Dezember 2013 05:58 >>>>> To: Lindenmaier, Goetz >>>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>> Independent Reads of Independent Writes >>>>> >>>>> Sorry for the delay, it takes a while to catch up after two weeks >>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check >>>>> emails. >>>>> >>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>>>> Hi, >>>>>> >>>>>> ok, I understand the tests are wrong. It's good this issue is >>>>>> settled. >>>>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>>>> >>>>>> About our change: David, the causality is the other way round. >>>>>> The change is about IRIW. >>>>>> 1. To pass IRIW, we must use sync instructions before loads. >>>>> >>>>> This is the part I still have some question marks over as the >>>>> implications are not nice for performance on non-TSO platforms. But I'm >>>>> no further along in processing that paper I'm afraid. >>>>> >>>>>> 2. If we do syncs before loads, we don't need to do them after stores. >>>>>> 3. If we don't do them after stores, we fail the volatile >>>>>> constructor tests. >>>>>> 4. So finally we added them again at the end of the constructor >>>>>> after stores >>>>>> to pass the volatile constructor tests. >>>>> >>>>> So we can at least undo #4 now we have established those tests were not >>>>> required to pass. >>>>> >>>>>> We originally passed the constructor tests because the ppc memory >>>>>> order >>>>>> instructions are not as find-granular as the >>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad. >>>>>> The only instruction >>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, >>>>>> therefore the >>>>>> MemBarVolatile after the store fixes the constructor tests. The >>>>>> proper representation >>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now >>>>>> it's pointless >>>>>> anyways. >>>>>> >>>>>>> I'm not happy with the ifdef approach but I won't block it. >>>>>> I'd be happy to add a property >>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>> >>>>> A compile-time guard (ifdef) would be better than a runtime one I think >>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>>>> based not architecture based) as that will allows for turning this >>>>> on/off for any architecture for testing purposes. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> or the like to guard the customization. I'd like that much better. >>>>>> Or also >>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>> VM_Version::needs_support_iriw_ordering() >>>>>> >>>>>> >>>>>> Best regards, >>>>>> Goetz. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>> Sent: Donnerstag, 28. November 2013 00:34 >>>>>> To: Lindenmaier, Goetz >>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>> Independent Reads of Independent Writes >>>>>> >>>>>> TL;DR version: >>>>>> >>>>>> Discussion on the c-i list has now confirmed that a >>>>>> constructor-barrier >>>>>> for volatiles is not required as part of the JMM specification. It >>>>>> *may* >>>>>> be required in an implementation that doesn't pre-zero memory to >>>>>> ensure >>>>>> you can't see uninitialized fields. So the tests for this are invalid >>>>>> and this part of the patch is not needed in general (ppc64 may need it >>>>>> due to other factors). >>>>>> >>>>>> Re: "multiple copy atomicity" - first thanks for correcting the >>>>>> term :) >>>>>> Second thanks for the reference to that paper! For reference: >>>>>> >>>>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>>>> complex interconnect) does not guarantee that a write becomes >>>>>> visible to >>>>>> all other hardware threads at the same time point; these architectures >>>>>> are not multiple-copy atomic." >>>>>> >>>>>> This is the visibility issue that I referred to and affects both >>>>>> ARM and >>>>>> PPC. But of course it is normally handled by using suitable barriers >>>>>> after the stores that need to be visible. I think the crux of the >>>>>> current issue is what you wrote below: >>>>>> >>>>>> > The fixes for the constructor issue are only needed because we >>>>>> > remove the sync instruction from behind stores (parse3.cpp:320) >>>>>> > and place it before loads. >>>>>> >>>>>> I hadn't grasped this part. Obviously if you fail to do the sync after >>>>>> the store then you have to do something around the loads to get the >>>>>> same >>>>>> results! I still don't know what lead you to the conclusion that the >>>>>> only way to fix the IRIW issue was to put the fence before the load - >>>>>> maybe when I get the chance to read that paper in full it will be >>>>>> clearer. >>>>>> >>>>>> So ... the basic problem is that the current structure in the VM has >>>>>> hard-wired one choice of how to get the right semantics for volatile >>>>>> variables. You now want to customize that but not all the requisite >>>>>> hooks are present. It would be better if volatile_load and >>>>>> volatile_store were factored out so that they could be implemented as >>>>>> desired per-platform. Alternatively there could be pre- and post- >>>>>> hooks >>>>>> that could then be customized per platform. Otherwise you need >>>>>> platform-specific ifdef's to handle it as per your patch. >>>>>> >>>>>> I'm not happy with the ifdef approach but I won't block it. I think >>>>>> this >>>>>> is an area where a lot of clean up is needed in the VM. The barrier >>>>>> abstractions are a confused mess in my opinion. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> ----- >>>>>> >>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>> >>>>>>> I did not yet add the >>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>> or >>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>> to reduce #defined, as I got no further comment on that. >>>>>>> >>>>>>> >>>>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>>>> I feel not in the position to contribute substantially. >>>>>>> >>>>>>> But we would like to pass the torture test suite as we consider >>>>>>> this a substantial task in implementing a PPC port. Also we think >>>>>>> both tests show behavior a programmer would expect. It's bad if >>>>>>> Java code runs fine on the more common x86 platform, and then >>>>>>> fails on ppc. This will always first be blamed on the VM. >>>>>>> >>>>>>> The fixes for the constructor issue are only needed because we >>>>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>>>> and place it before loads. Then there is no sync between volatile >>>>>>> store >>>>>>> and publishing the object. So we add it again in this one case >>>>>>> (volatile store in constructor). >>>>>>> >>>>>>> >>>>>>> @David >>>>>>>>> Sure. There also is no solution as you require for the >>>>>>>>> taskqueue problem yet, >>>>>>>>> and that's being discussed now for almost a year. >>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>> continuous. >>>>>>> That's not true, we did a lot of investigation and testing on this >>>>>>> issue. >>>>>>> And we came up with a solution we consider the best possible. If you >>>>>>> have objections, you should at least give the draft of a better >>>>>>> solution, >>>>>>> we would volunteer to implement and test it. >>>>>>> Similarly, we invested time in fixing the concurrency torture issues. >>>>>>> >>>>>>> @David >>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term >>>>>>>> and >>>>>>>> can't find any reference to it. >>>>>>> We learned about this reading "A Tutorial Introduction to the ARM and >>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>>>> Peter Sewell, which is cited in "Correct and Efficient >>>>>>> Work-Stealing for >>>>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the >>>>>>> taskqueue problem. >>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>>> >>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I >>>>>>> used 'read' >>>>>>> instead. Sorry for that. (I also fixed that in the method name >>>>>>> above). >>>>>>> >>>>>>> Best regards and thanks for all your involvements, >>>>>>> Goetz. >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>>>> To: Lindenmaier, Goetz >>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>> Independent Reads of Independent Writes >>>>>>> >>>>>>> Hi Goetz, >>>>>>> >>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>>>> Hi David, >>>>>>>> >>>>>>>> -- Volatile in constuctor >>>>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>>>> missing constructor barrier. >>>>>>>> We see them on PPC64. Our test machines have typically 8-32 >>>>>>>> processors >>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks Aleksey!) >>>>>>> >>>>>>> And see follow ups - the tests are invalid. >>>>>>> >>>>>>>> -- IRIW issue >>>>>>>>> I can not possibly answer to the necessary level of detail with >>>>>>>>> a few >>>>>>>>> moments thought. >>>>>>>> Sure. There also is no solution as you require for the taskqueue >>>>>>>> problem yet, >>>>>>>> and that's being discussed now for almost a year. >>>>>>> >>>>>>> It may have started a year ago but work on it has hardly been >>>>>>> continuous. >>>>>>> >>>>>>>>> You are implying there is a problem here that will >>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>> different?) >>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore >>>>>>>> I contributed a >>>>>>>> solution with the #defines, and that's correct for all, but not >>>>>>>> nice, I admit. >>>>>>>> (I don't really know about ARM, though). >>>>>>>> So if I can write down a nicer solution testing for methods that >>>>>>>> are evaluated >>>>>>>> by the C-compiler I'm happy. >>>>>>>> >>>>>>>> The problem is not that IRIW is not handled by the JMM, the problem >>>>>>>> is that >>>>>>>> store >>>>>>>> sync >>>>>>>> does not assure multiple-read-atomicity, >>>>>>>> only >>>>>>>> sync >>>>>>>> load >>>>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>>>> pass that test. >>>>>>> >>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term and >>>>>>> can't find any reference to it. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> The JMM is fine. And >>>>>>>> store >>>>>>>> MemBarVolatile >>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions >>>>>>>> that >>>>>>>> do what is required. >>>>>>>> >>>>>>>> So if you are off soon, please let's come to a solution that >>>>>>>> might be improvable in the way it's implemented, but that >>>>>>>> allows us to implement a correct PPC64 port. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Goetz. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>>>> To: Lindenmaier, Goetz >>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; >>>>>>>> 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> Hi Goetz, >>>>>>>> >>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi everybody, >>>>>>>>> >>>>>>>>> thanks a lot for the detailed reviews! >>>>>>>>> I'll try to answer to all in one mail. >>>>>>>>> >>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM >>>>>>>>>> to occur before the reference is assigned; >>>>>>>>> We don't think it's correct if we omit the barrier after >>>>>>>>> initializing >>>>>>>>> a volatile field. Previously, we discussed this with Aleksey >>>>>>>>> Shipilev >>>>>>>>> and Doug Lea, and they agreed. >>>>>>>>> Also, concurrency torture tests >>>>>>>>> LongVolatileTest >>>>>>>>> AtomicIntegerInitialValueTest >>>>>>>>> will fail. >>>>>>>>> (In addition, observing 0 instead of the inital value of a >>>>>>>>> volatile field would be >>>>>>>>> very counter-intuitive for Java programmers, especially in >>>>>>>>> AtomicInteger.) >>>>>>>> >>>>>>>> The affects of unsafe publication are always surprising - >>>>>>>> volatiles do >>>>>>>> not add anything special here. AFAIK there is nothing in the JMM >>>>>>>> that >>>>>>>> requires the constructor barrier - discussions with Doug and Aleksey >>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>>>> missing constructor barrier. >>>>>>>> >>>>>>>>>> proposed for PPC64 is to make volatile reads extremely heavyweight >>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We >>>>>>>>> don't >>>>>>>>> see a way to implement this cheaper. >>>>>>>>> >>>>>>>>>> - these algorithms should be expressed using the correct >>>>>>>>>> OrderAccess operations >>>>>>>>> Basically, I agree on this. But you also have to take into account >>>>>>>>> that due to the different memory ordering instructions on >>>>>>>>> different platforms >>>>>>>>> just implementing something empty is not sufficient. >>>>>>>>> An example: >>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>>>> If these are consecutively in the code, sparc code looks like this: >>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore | >>>>>>>>> Assembler::StoreStore) >>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>>>> Just doing what is required. >>>>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>>>> fine grained operations: >>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore, >>>>>>>>> StoreStore, LoadLoad >>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore, >>>>>>>>> StoreStore, LoadLoad, StoreLoad >>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations >>>>>>>>> are more (too) powerful, >>>>>>>>> I need an additional optimization that removes the lwsync. I >>>>>>>>> can not implement >>>>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>>>> >>>>>>>>> Back to the IRIW problem. I think here we have a comparable issue. >>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the >>>>>>>>> read >>>>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>>>> >>>>>>>>> I would propose to guard the code by >>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>>>> Else, David, how would you propose to implement this platform >>>>>>>>> independent? >>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>>>> >>>>>>>> I can not possibly answer to the necessary level of detail with a >>>>>>>> few >>>>>>>> moments thought. You are implying there is a problem here that will >>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>> different?) and I can not take that on face value at the moment. The >>>>>>>> only reason I can see IRIW not being handled by the JMM >>>>>>>> requirements for >>>>>>>> volatile accesses is if there are global visibility issues that >>>>>>>> are not >>>>>>>> addressed - but even then I would expect heavy barriers at the store >>>>>>>> would deal with that, not at the load. (This situation reminds me >>>>>>>> of the >>>>>>>> need for read-barriers on Alpha architecture due to the use of >>>>>>>> software >>>>>>>> cache-coherency rather than hardware cache-coherency - but we >>>>>>>> don't have >>>>>>>> that on ppc!) >>>>>>>> >>>>>>>> Sorry - There is no quick resolution here and in a couple of days >>>>>>>> I will >>>>>>>> be heading out on vacation for two weeks. >>>>>>>> >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Goetz. >>>>>>>>> >>>>>>>>> -- Other ports: >>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so >>>>>>>>> it might >>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of them). >>>>>>>>> >>>>>>>>> -- MemBarStoreStore after initialization >>>>>>>>> I agree we should not change it in the ppc port. If you wish, I >>>>>>>>> can >>>>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>>>> To: Vladimir Kozlov >>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; >>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> Okay this is my second attempt at answering this in a reasonable >>>>>>>>> way :) >>>>>>>>> >>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>>>> I have to ask David to do correctness evaluation. >>>>>>>>> >>>>>>>>> From what I understand what we see here is an attempt to >>>>>>>>> fix an >>>>>>>>> existing issue with the implementation of volatiles so that the >>>>>>>>> IRIW >>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>>>> volatile reads extremely heavyweight by adding a fence() when >>>>>>>>> doing the >>>>>>>>> load. >>>>>>>>> >>>>>>>>> Now if this was purely handled in ppc64 source code then I would be >>>>>>>>> happy to let them do whatever they like (surely this kills >>>>>>>>> performance >>>>>>>>> though!). But I do not agree with the changes to the shared code >>>>>>>>> that >>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY >>>>>>>>> this is >>>>>>>>> polluting the shared code. My concern is similar to what I said >>>>>>>>> with the >>>>>>>>> taskQueue changes - these algorithms should be expressed using the >>>>>>>>> correct OrderAccess operations to guarantee the desired properties >>>>>>>>> independent of architecture. If such a "barrier" is not needed on a >>>>>>>>> given architecture then the implementation in OrderAccess should >>>>>>>>> reduce >>>>>>>>> to a no-op. >>>>>>>>> >>>>>>>>> And as Vitaly points out the constructor barriers are not needed >>>>>>>>> under >>>>>>>>> the JMM. >>>>>>>>> >>>>>>>>>> I am fine with suggested changes because you did not change our >>>>>>>>>> current >>>>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>>>> But may be it should be done using more general query which is set >>>>>>>>>> depending on platform: >>>>>>>>>> >>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>> >>>>>>>>>> or similar to what we use now: >>>>>>>>>> >>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>> >>>>>>>>> Every platform has to support IRIW this is simply part of the Java >>>>>>>>> Memory Model, there should not be any need to call this out >>>>>>>>> explicitly >>>>>>>>> like this. >>>>>>>>> >>>>>>>>> Is there some subtlety of the hardware I am missing here? Are there >>>>>>>>> visibility issues beyond the ordering constraints that the JMM >>>>>>>>> defines? >>>>>>>>>> From what I understand our ppc port is also affected. >>>>>>>>>> David? >>>>>>>>> >>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>>>> >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> In library_call.cpp can you add {}? New comment should be >>>>>>>>>> inside else {}. >>>>>>>>>> >>>>>>>>>> I think you should make _wrote_volatile field not ppc64 >>>>>>>>>> specific which >>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need >>>>>>>>>> PPC64_ONLY() >>>>>>>>>> except in do_put_xxx() where it is set to true. Too many #ifdefs. >>>>>>>>>> >>>>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>>>> >>>>>>>>>> if (is_vol) { >>>>>>>>>> // See comment in do_get_xxx(). >>>>>>>>>> #ifndef PPC64 >>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>>>> #else >>>>>>>>>> if (is_field) { >>>>>>>>>> // Add MemBarRelease for constructors which write >>>>>>>>>> volatile field >>>>>>>>>> (PPC64). >>>>>>>>>> set_wrote_volatile(true); >>>>>>>>>> } >>>>>>>>>> #endif >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I preprared a webrev with fixes for PPC for the >>>>>>>>>>> VolatileIRIWTest of >>>>>>>>>>> the torture test suite: >>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>> >>>>>>>>>>> Example: >>>>>>>>>>> volatile x=0, y=0 >>>>>>>>>>> __________ __________ __________ __________ >>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>>>> >>>>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>>>> read(y) read(x) >>>>>>>>>>> >>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This >>>>>>>>>>> is only >>>>>>>>>>> assured by the sync instruction and if it is executed in the >>>>>>>>>>> threads >>>>>>>>>>> doing the loads. Thus we implement volatile read as >>>>>>>>>>> sync-load-acquire >>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>>>>> >>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs >>>>>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>>>>> >>>>>>>>>>> Further this change contains a fix that assures that volatile >>>>>>>>>>> fields >>>>>>>>>>> written in constructors are visible before the reference gets >>>>>>>>>>> published. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us, >>>>>>>>>>> seems too >>>>>>>>>>> strong. >>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should >>>>>>>>>>> suffice. >>>>>>>>>>> What do you think? >>>>>>>>>>> >>>>>>>>>>> Please review and test this change. >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Goetz. >>>>>>>>>>> From david.holmes at oracle.com Thu Jan 16 01:04:41 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 16 Jan 2014 19:04:41 +1000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <52D79E61.1060801@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293F087.2080700@oracle.com> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> <52D78697.2090408@oracle.com> <52D79982.4060100@oracle.com> <52D79E61.1060801@oracle.com> Message-ID: <52D7A0A9.6070208@oracle.com> On 16/01/2014 6:54 PM, Vladimir Kozlov wrote: > On 1/16/14 12:34 AM, David Holmes wrote: >> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote: >>> This is becoming ugly #ifdef mess. In compiler code we are trying to >>> avoid them. I suggested to have _wrote_volatile without #ifdef and I >>> want to keep it this way, it could be useful to have such info on other >>> platforms too. But I would suggest to remove PPC64 comments in >>> parse.hpp. >>> >>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value >>> which could be checked in all places instead of #ifdef: >> >> I asked for the ifdef some time back as I find it much preferable to >> have this as a build-time construct rather than a >> runtime one. I don't want to have to pay anything for this if we don't >> use it. > > Any decent C++ compiler will optimize expressions with such constants > defined in header files. I insist to avoid #ifdefs in C2 code. I really > don't like the code with #ifdef in unsafe.cpp but I can live with it. If you insist then we may as well do it all the same way. Better to be consistent. My apologies Goetz for wasting your time going back and forth on this. That aside I have a further concern with this IRIW support - it is incomplete as there is no C1 support, as PPC64 isn't using client. If this is going on then we (which probably means the Oracle 'we') need to add the missing C1 code. David ----- > Vladimir > >> >> David >> >>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC >>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; >>> #else >>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; >>> #endif >>> >>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever >>> >>> and then: >>> >>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ >>> oop p = JNIHandles::resolve(obj); \ >>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) >>> OrderAccess::fence(); \ >>> volatile type_name v = OrderAccess::load_acquire((volatile >>> type_name*)index_oop_from_field_offset_long(p, offset)); >>> >>> And: >>> >>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu && >>> field->is_volatile()) { >>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier >>> + } >>> >>> And so on. The comments will be needed only in globalDefinitions.hpp >>> >>> The code in parse1.cpp could be put on one line: >>> >>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && >>> method()->is_initializer()) )) { >>> >>> Thanks, >>> Vladimir >>> >>> On 1/15/14 9:25 PM, David Holmes wrote: >>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >>>>> Hi David, >>>>> >>>>> I updated the webrev: >>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>> >>>>> - I removed the IRIW example in parse3.cpp >>>>> - I adapted the comments not to point to that comment, and to >>>>> reflect the new flagging. Also I mention that we support the >>>>> volatile constructor issue, but that it's not standard. >>>>> - I protected issuing the barrier for the constructor by PPC64. >>>>> I also think it's better to separate these this way. >>>> >>>> Sorry if I wasn't clear but I'd like the wrote_volatile field >>>> declaration and all uses to be guarded by ifdef PPC64 too >>>> please. >>>> >>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this >>>> comment doesn't make much sense to me and refers to >>>> ppc specific stuff in a shared file: >>>> >>>> if (is_volatile) { >>>> ! if (!is_store) { >>>> insert_mem_bar(Op_MemBarAcquire); >>>> ! } else { >>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. >>>> insert_mem_bar(Op_MemBarVolatile); >>>> + #endif >>>> + } >>>> >>>> I don't think the comment is needed. >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks for your comments! >>>>> >>>>> Best regards, >>>>> Goetz. >>>>> >>>>> -----Original Message----- >>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>> Sent: Mittwoch, 15. Januar 2014 01:55 >>>>> To: Lindenmaier, Goetz >>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; >>>>> 'hotspot-dev at openjdk.java.net' >>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>> Independent Reads of Independent Writes >>>>> >>>>> Hi Goetz, >>>>> >>>>> Sorry for the delay in getting back to this. >>>>> >>>>> The general changes to the volatile barriers to support IRIW are okay. >>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >>>>> specifically it is >>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >>>>> the commentary excessive, particularly for shared code. In particular >>>>> the IRIW example in parse3.cpp - it seems a strange place to give the >>>>> explanation and I don't think we need it to that level of detail. >>>>> Seems >>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >>>>> >>>>> The changes related to volatile writes in the constructor, as >>>>> discussed >>>>> are not required by the Java Memory Model. If you want to keep these >>>>> then I think they should all be guarded with PPC64 because it is not >>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >>>>> PPC64 porters. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>>>>> Hi, >>>>>> >>>>>> I updated this webrev. I detected a small flaw I made when editing >>>>>> this version. >>>>>> The #endif in line 322, parse3.cpp was in the wrong line. >>>>>> I also based the webrev on the latest version of the stage repo. >>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>> >>>>>> Best regards, >>>>>> Goetz. >>>>>> >>>>>> -----Original Message----- >>>>>> From: Lindenmaier, Goetz >>>>>> Sent: Freitag, 20. Dezember 2013 13:47 >>>>>> To: David Holmes >>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>> Independent Reads of Independent Writes >>>>>> >>>>>> Hi David, >>>>>> >>>>>>> So we can at least undo #4 now we have established those tests were >>>>>>> not >>>>>>> required to pass. >>>>>> We would prefer if we could keep this in. We want to avoid that it's >>>>>> blamed on the VM if java programs are failing on PPC after they >>>>>> worked >>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard >>>>>> it by >>>>>> a flag as proposed. But if you insist I will remove it. Also, this >>>>>> part is >>>>>> not that performance relevant. >>>>>> >>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>> think >>>>>> I added a compile-time guard in this new webrev: >>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>>>>> several double negations I don't like, (#ifNdef >>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC) >>>>>> but this way I only have to change the ppc platform. >>>>>> >>>>>> Best regards, >>>>>> Goetz >>>>>> >>>>>> P.S.: I will also be available over the Christmas period. >>>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>> Sent: Freitag, 20. Dezember 2013 05:58 >>>>>> To: Lindenmaier, Goetz >>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>> Independent Reads of Independent Writes >>>>>> >>>>>> Sorry for the delay, it takes a while to catch up after two weeks >>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check >>>>>> emails. >>>>>> >>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>>>>> Hi, >>>>>>> >>>>>>> ok, I understand the tests are wrong. It's good this issue is >>>>>>> settled. >>>>>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>>>>> >>>>>>> About our change: David, the causality is the other way round. >>>>>>> The change is about IRIW. >>>>>>> 1. To pass IRIW, we must use sync instructions before loads. >>>>>> >>>>>> This is the part I still have some question marks over as the >>>>>> implications are not nice for performance on non-TSO platforms. >>>>>> But I'm >>>>>> no further along in processing that paper I'm afraid. >>>>>> >>>>>>> 2. If we do syncs before loads, we don't need to do them after >>>>>>> stores. >>>>>>> 3. If we don't do them after stores, we fail the volatile >>>>>>> constructor tests. >>>>>>> 4. So finally we added them again at the end of the constructor >>>>>>> after stores >>>>>>> to pass the volatile constructor tests. >>>>>> >>>>>> So we can at least undo #4 now we have established those tests >>>>>> were not >>>>>> required to pass. >>>>>> >>>>>>> We originally passed the constructor tests because the ppc memory >>>>>>> order >>>>>>> instructions are not as find-granular as the >>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad. >>>>>>> The only instruction >>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, >>>>>>> therefore the >>>>>>> MemBarVolatile after the store fixes the constructor tests. The >>>>>>> proper representation >>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now >>>>>>> it's pointless >>>>>>> anyways. >>>>>>> >>>>>>>> I'm not happy with the ifdef approach but I won't block it. >>>>>>> I'd be happy to add a property >>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>> >>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>> think >>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>>>>> based not architecture based) as that will allows for turning this >>>>>> on/off for any architecture for testing purposes. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>>> or the like to guard the customization. I'd like that much better. >>>>>>> Or also >>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>> >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>> Sent: Donnerstag, 28. November 2013 00:34 >>>>>>> To: Lindenmaier, Goetz >>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>> Independent Reads of Independent Writes >>>>>>> >>>>>>> TL;DR version: >>>>>>> >>>>>>> Discussion on the c-i list has now confirmed that a >>>>>>> constructor-barrier >>>>>>> for volatiles is not required as part of the JMM specification. It >>>>>>> *may* >>>>>>> be required in an implementation that doesn't pre-zero memory to >>>>>>> ensure >>>>>>> you can't see uninitialized fields. So the tests for this are >>>>>>> invalid >>>>>>> and this part of the patch is not needed in general (ppc64 may >>>>>>> need it >>>>>>> due to other factors). >>>>>>> >>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the >>>>>>> term :) >>>>>>> Second thanks for the reference to that paper! For reference: >>>>>>> >>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>>>>> complex interconnect) does not guarantee that a write becomes >>>>>>> visible to >>>>>>> all other hardware threads at the same time point; these >>>>>>> architectures >>>>>>> are not multiple-copy atomic." >>>>>>> >>>>>>> This is the visibility issue that I referred to and affects both >>>>>>> ARM and >>>>>>> PPC. But of course it is normally handled by using suitable barriers >>>>>>> after the stores that need to be visible. I think the crux of the >>>>>>> current issue is what you wrote below: >>>>>>> >>>>>>> > The fixes for the constructor issue are only needed because we >>>>>>> > remove the sync instruction from behind stores >>>>>>> (parse3.cpp:320) >>>>>>> > and place it before loads. >>>>>>> >>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync >>>>>>> after >>>>>>> the store then you have to do something around the loads to get the >>>>>>> same >>>>>>> results! I still don't know what lead you to the conclusion that the >>>>>>> only way to fix the IRIW issue was to put the fence before the >>>>>>> load - >>>>>>> maybe when I get the chance to read that paper in full it will be >>>>>>> clearer. >>>>>>> >>>>>>> So ... the basic problem is that the current structure in the VM has >>>>>>> hard-wired one choice of how to get the right semantics for volatile >>>>>>> variables. You now want to customize that but not all the requisite >>>>>>> hooks are present. It would be better if volatile_load and >>>>>>> volatile_store were factored out so that they could be >>>>>>> implemented as >>>>>>> desired per-platform. Alternatively there could be pre- and post- >>>>>>> hooks >>>>>>> that could then be customized per platform. Otherwise you need >>>>>>> platform-specific ifdef's to handle it as per your patch. >>>>>>> >>>>>>> I'm not happy with the ifdef approach but I won't block it. I think >>>>>>> this >>>>>>> is an area where a lot of clean up is needed in the VM. The barrier >>>>>>> abstractions are a confused mess in my opinion. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>> >>>>>>>> I did not yet add the >>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>> or >>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>> to reduce #defined, as I got no further comment on that. >>>>>>>> >>>>>>>> >>>>>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>>>>> I feel not in the position to contribute substantially. >>>>>>>> >>>>>>>> But we would like to pass the torture test suite as we consider >>>>>>>> this a substantial task in implementing a PPC port. Also we think >>>>>>>> both tests show behavior a programmer would expect. It's bad if >>>>>>>> Java code runs fine on the more common x86 platform, and then >>>>>>>> fails on ppc. This will always first be blamed on the VM. >>>>>>>> >>>>>>>> The fixes for the constructor issue are only needed because we >>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>>>>> and place it before loads. Then there is no sync between volatile >>>>>>>> store >>>>>>>> and publishing the object. So we add it again in this one case >>>>>>>> (volatile store in constructor). >>>>>>>> >>>>>>>> >>>>>>>> @David >>>>>>>>>> Sure. There also is no solution as you require for the >>>>>>>>>> taskqueue problem yet, >>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>> continuous. >>>>>>>> That's not true, we did a lot of investigation and testing on this >>>>>>>> issue. >>>>>>>> And we came up with a solution we consider the best possible. If >>>>>>>> you >>>>>>>> have objections, you should at least give the draft of a better >>>>>>>> solution, >>>>>>>> we would volunteer to implement and test it. >>>>>>>> Similarly, we invested time in fixing the concurrency torture >>>>>>>> issues. >>>>>>>> >>>>>>>> @David >>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term >>>>>>>>> and >>>>>>>>> can't find any reference to it. >>>>>>>> We learned about this reading "A Tutorial Introduction to the >>>>>>>> ARM and >>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>>>>> Peter Sewell, which is cited in "Correct and Efficient >>>>>>>> Work-Stealing for >>>>>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the >>>>>>>> taskqueue problem. >>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>>>> >>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I >>>>>>>> used 'read' >>>>>>>> instead. Sorry for that. (I also fixed that in the method name >>>>>>>> above). >>>>>>>> >>>>>>>> Best regards and thanks for all your involvements, >>>>>>>> Goetz. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>>>>> To: Lindenmaier, Goetz >>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> Hi Goetz, >>>>>>>> >>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>> -- Volatile in constuctor >>>>>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>>>>> missing constructor barrier. >>>>>>>>> We see them on PPC64. Our test machines have typically 8-32 >>>>>>>>> processors >>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks >>>>>>>>> Aleksey!) >>>>>>>> >>>>>>>> And see follow ups - the tests are invalid. >>>>>>>> >>>>>>>>> -- IRIW issue >>>>>>>>>> I can not possibly answer to the necessary level of detail with >>>>>>>>>> a few >>>>>>>>>> moments thought. >>>>>>>>> Sure. There also is no solution as you require for the taskqueue >>>>>>>>> problem yet, >>>>>>>>> and that's being discussed now for almost a year. >>>>>>>> >>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>> continuous. >>>>>>>> >>>>>>>>>> You are implying there is a problem here that will >>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>> different?) >>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore >>>>>>>>> I contributed a >>>>>>>>> solution with the #defines, and that's correct for all, but not >>>>>>>>> nice, I admit. >>>>>>>>> (I don't really know about ARM, though). >>>>>>>>> So if I can write down a nicer solution testing for methods that >>>>>>>>> are evaluated >>>>>>>>> by the C-compiler I'm happy. >>>>>>>>> >>>>>>>>> The problem is not that IRIW is not handled by the JMM, the >>>>>>>>> problem >>>>>>>>> is that >>>>>>>>> store >>>>>>>>> sync >>>>>>>>> does not assure multiple-read-atomicity, >>>>>>>>> only >>>>>>>>> sync >>>>>>>>> load >>>>>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>>>>> pass that test. >>>>>>>> >>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the >>>>>>>> term and >>>>>>>> can't find any reference to it. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>>> The JMM is fine. And >>>>>>>>> store >>>>>>>>> MemBarVolatile >>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions >>>>>>>>> that >>>>>>>>> do what is required. >>>>>>>>> >>>>>>>>> So if you are off soon, please let's come to a solution that >>>>>>>>> might be improvable in the way it's implemented, but that >>>>>>>>> allows us to implement a correct PPC64 port. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Goetz. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; >>>>>>>>> 'hotspot-dev at openjdk.java.net'; >>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> Hi Goetz, >>>>>>>>> >>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>>>>> Hi everybody, >>>>>>>>>> >>>>>>>>>> thanks a lot for the detailed reviews! >>>>>>>>>> I'll try to answer to all in one mail. >>>>>>>>>> >>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM >>>>>>>>>>> to occur before the reference is assigned; >>>>>>>>>> We don't think it's correct if we omit the barrier after >>>>>>>>>> initializing >>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey >>>>>>>>>> Shipilev >>>>>>>>>> and Doug Lea, and they agreed. >>>>>>>>>> Also, concurrency torture tests >>>>>>>>>> LongVolatileTest >>>>>>>>>> AtomicIntegerInitialValueTest >>>>>>>>>> will fail. >>>>>>>>>> (In addition, observing 0 instead of the inital value of a >>>>>>>>>> volatile field would be >>>>>>>>>> very counter-intuitive for Java programmers, especially in >>>>>>>>>> AtomicInteger.) >>>>>>>>> >>>>>>>>> The affects of unsafe publication are always surprising - >>>>>>>>> volatiles do >>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM >>>>>>>>> that >>>>>>>>> requires the constructor barrier - discussions with Doug and >>>>>>>>> Aleksey >>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>>>>> missing constructor barrier. >>>>>>>>> >>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely >>>>>>>>>>> heavyweight >>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We >>>>>>>>>> don't >>>>>>>>>> see a way to implement this cheaper. >>>>>>>>>> >>>>>>>>>>> - these algorithms should be expressed using the correct >>>>>>>>>>> OrderAccess operations >>>>>>>>>> Basically, I agree on this. But you also have to take into >>>>>>>>>> account >>>>>>>>>> that due to the different memory ordering instructions on >>>>>>>>>> different platforms >>>>>>>>>> just implementing something empty is not sufficient. >>>>>>>>>> An example: >>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>>>>> If these are consecutively in the code, sparc code looks like >>>>>>>>>> this: >>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore | >>>>>>>>>> Assembler::StoreStore) >>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>>>>> Just doing what is required. >>>>>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>>>>> fine grained operations: >>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore, >>>>>>>>>> StoreStore, LoadLoad >>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore, >>>>>>>>>> StoreStore, LoadLoad, StoreLoad >>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations >>>>>>>>>> are more (too) powerful, >>>>>>>>>> I need an additional optimization that removes the lwsync. I >>>>>>>>>> can not implement >>>>>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>>>>> >>>>>>>>>> Back to the IRIW problem. I think here we have a comparable >>>>>>>>>> issue. >>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the >>>>>>>>>> read >>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>>>>> >>>>>>>>>> I would propose to guard the code by >>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>>>>> Else, David, how would you propose to implement this platform >>>>>>>>>> independent? >>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>>>>> >>>>>>>>> I can not possibly answer to the necessary level of detail with a >>>>>>>>> few >>>>>>>>> moments thought. You are implying there is a problem here that >>>>>>>>> will >>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>> different?) and I can not take that on face value at the >>>>>>>>> moment. The >>>>>>>>> only reason I can see IRIW not being handled by the JMM >>>>>>>>> requirements for >>>>>>>>> volatile accesses is if there are global visibility issues that >>>>>>>>> are not >>>>>>>>> addressed - but even then I would expect heavy barriers at the >>>>>>>>> store >>>>>>>>> would deal with that, not at the load. (This situation reminds me >>>>>>>>> of the >>>>>>>>> need for read-barriers on Alpha architecture due to the use of >>>>>>>>> software >>>>>>>>> cache-coherency rather than hardware cache-coherency - but we >>>>>>>>> don't have >>>>>>>>> that on ppc!) >>>>>>>>> >>>>>>>>> Sorry - There is no quick resolution here and in a couple of days >>>>>>>>> I will >>>>>>>>> be heading out on vacation for two weeks. >>>>>>>>> >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Goetz. >>>>>>>>>> >>>>>>>>>> -- Other ports: >>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so >>>>>>>>>> it might >>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of >>>>>>>>>> them). >>>>>>>>>> >>>>>>>>>> -- MemBarStoreStore after initialization >>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I >>>>>>>>>> can >>>>>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>>>>> To: Vladimir Kozlov >>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; >>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>> >>>>>>>>>> Okay this is my second attempt at answering this in a reasonable >>>>>>>>>> way :) >>>>>>>>>> >>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>>>>> I have to ask David to do correctness evaluation. >>>>>>>>>> >>>>>>>>>> From what I understand what we see here is an attempt to >>>>>>>>>> fix an >>>>>>>>>> existing issue with the implementation of volatiles so that the >>>>>>>>>> IRIW >>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when >>>>>>>>>> doing the >>>>>>>>>> load. >>>>>>>>>> >>>>>>>>>> Now if this was purely handled in ppc64 source code then I >>>>>>>>>> would be >>>>>>>>>> happy to let them do whatever they like (surely this kills >>>>>>>>>> performance >>>>>>>>>> though!). But I do not agree with the changes to the shared code >>>>>>>>>> that >>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY >>>>>>>>>> this is >>>>>>>>>> polluting the shared code. My concern is similar to what I said >>>>>>>>>> with the >>>>>>>>>> taskQueue changes - these algorithms should be expressed using >>>>>>>>>> the >>>>>>>>>> correct OrderAccess operations to guarantee the desired >>>>>>>>>> properties >>>>>>>>>> independent of architecture. If such a "barrier" is not needed >>>>>>>>>> on a >>>>>>>>>> given architecture then the implementation in OrderAccess should >>>>>>>>>> reduce >>>>>>>>>> to a no-op. >>>>>>>>>> >>>>>>>>>> And as Vitaly points out the constructor barriers are not needed >>>>>>>>>> under >>>>>>>>>> the JMM. >>>>>>>>>> >>>>>>>>>>> I am fine with suggested changes because you did not change our >>>>>>>>>>> current >>>>>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>>>>> But may be it should be done using more general query which >>>>>>>>>>> is set >>>>>>>>>>> depending on platform: >>>>>>>>>>> >>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>> >>>>>>>>>>> or similar to what we use now: >>>>>>>>>>> >>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>> >>>>>>>>>> Every platform has to support IRIW this is simply part of the >>>>>>>>>> Java >>>>>>>>>> Memory Model, there should not be any need to call this out >>>>>>>>>> explicitly >>>>>>>>>> like this. >>>>>>>>>> >>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are >>>>>>>>>> there >>>>>>>>>> visibility issues beyond the ordering constraints that the JMM >>>>>>>>>> defines? >>>>>>>>>>> From what I understand our ppc port is also affected. >>>>>>>>>>> David? >>>>>>>>>> >>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> In library_call.cpp can you add {}? New comment should be >>>>>>>>>>> inside else {}. >>>>>>>>>>> >>>>>>>>>>> I think you should make _wrote_volatile field not ppc64 >>>>>>>>>>> specific which >>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need >>>>>>>>>>> PPC64_ONLY() >>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many >>>>>>>>>>> #ifdefs. >>>>>>>>>>> >>>>>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>>>>> >>>>>>>>>>> if (is_vol) { >>>>>>>>>>> // See comment in do_get_xxx(). >>>>>>>>>>> #ifndef PPC64 >>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>>>>> #else >>>>>>>>>>> if (is_field) { >>>>>>>>>>> // Add MemBarRelease for constructors which write >>>>>>>>>>> volatile field >>>>>>>>>>> (PPC64). >>>>>>>>>>> set_wrote_volatile(true); >>>>>>>>>>> } >>>>>>>>>>> #endif >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vladimir >>>>>>>>>>> >>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I preprared a webrev with fixes for PPC for the >>>>>>>>>>>> VolatileIRIWTest of >>>>>>>>>>>> the torture test suite: >>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>>> >>>>>>>>>>>> Example: >>>>>>>>>>>> volatile x=0, y=0 >>>>>>>>>>>> __________ __________ __________ __________ >>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>>>>> >>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>>>>> read(y) read(x) >>>>>>>>>>>> >>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This >>>>>>>>>>>> is only >>>>>>>>>>>> assured by the sync instruction and if it is executed in the >>>>>>>>>>>> threads >>>>>>>>>>>> doing the loads. Thus we implement volatile read as >>>>>>>>>>>> sync-load-acquire >>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>>>>>> >>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple >>>>>>>>>>>> SIGSEGVs >>>>>>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>>>>>> >>>>>>>>>>>> Further this change contains a fix that assures that volatile >>>>>>>>>>>> fields >>>>>>>>>>>> written in constructors are visible before the reference gets >>>>>>>>>>>> published. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us, >>>>>>>>>>>> seems too >>>>>>>>>>>> strong. >>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should >>>>>>>>>>>> suffice. >>>>>>>>>>>> What do you think? >>>>>>>>>>>> >>>>>>>>>>>> Please review and test this change. >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> Goetz. >>>>>>>>>>>> From volker.simonis at gmail.com Thu Jan 16 01:38:55 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Thu, 16 Jan 2014 10:38:55 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: <52D51FAC.8060800@oracle.com> <52D629A0.4080806@oracle.com> <52D64ED5.4020409@oracle.com> Message-ID: On Wed, Jan 15, 2014 at 12:05 PM, Volker Simonis wrote: > > > > On Wed, Jan 15, 2014 at 10:03 AM, Alan Bateman wrote: > >> On 15/01/2014 06:24, David Holmes wrote: >> >>> >>> I'm not a fan of runtime checks of this kind though if it is only a very >>> samll number of values it might be okay. >>> >>> Another option would be to make those classes into "templates" as done >>> with Version.java.template and substitute the right values at build time. >>> >>> But I'll let Alan and net-dev folk come back with their preferred >>> technique for this. >>> >>> I plan to spend time on Volker's webrev later in the week (just too >> busy with other things right now). For the translation issue then it's an >> oversight in the original implementation, it just hasn't come up before (to >> my knowledge anyway). The simplest solution here maybe to to just move them >> to sun.net.ch.Net and have them initialized to their native value. > > > Do you mean sun.nio.ch.Net right? > > Do you propose to completely remove the definitions of the POLL constants > from: > > src/share/classes/sun/nio/ch/AbstractPollArrayWrapper.java > src/solaris/classes/sun/nio/ch/Port.java > > and replace all their usages by Net.POLL* ? > > Hi Alan, I think sun.nio.ch.IOUtil seems even more appropriate to me for these constants. What do you think? Would it be OK for you if I initialize them right in the static initializer of IOUtil based on "os.name" or do you prefer to have native methods which return the right constants? Regards, Volker > In general then I'm not too concerned about that one, it's the changes to >> support async close on AIX that are leaping out at me. >> >> -Alan >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140116/cacb3907/attachment.html From Alan.Bateman at oracle.com Thu Jan 16 02:05:44 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Thu, 16 Jan 2014 10:05:44 +0000 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: <52D51FAC.8060800@oracle.com> <52D629A0.4080806@oracle.com> <52D64ED5.4020409@oracle.com> Message-ID: <52D7AEF8.8090502@oracle.com> On 16/01/2014 09:38, Volker Simonis wrote: > > > Hi Alan, > > I think sun.nio.ch.IOUtil seems even more appropriate to me for these > constants. What do you think? > > Would it be OK for you if I initialize them right in the static > initializer of IOUtil based on "os.name " or do you > prefer to have native methods which return the right constants? I have a small preference for sun.nio.ch.Net because these constants are used with Net.poll. Would you be open to separating this one from the AIX changes? The reason is that it isn't AIX specific, rather just an oversight that hasn't been an issue because it doesn't impact other platforms. Using os.name initially would be okay although we could change that over time. -Alan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140116/0bb0612a/attachment.html From volker.simonis at gmail.com Thu Jan 16 02:34:03 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Thu, 16 Jan 2014 11:34:03 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: <52D7AEF8.8090502@oracle.com> References: <52D51FAC.8060800@oracle.com> <52D629A0.4080806@oracle.com> <52D64ED5.4020409@oracle.com> <52D7AEF8.8090502@oracle.com> Message-ID: On Thu, Jan 16, 2014 at 11:05 AM, Alan Bateman wrote: > On 16/01/2014 09:38, Volker Simonis wrote: > > > > Hi Alan, > > I think sun.nio.ch.IOUtil seems even more appropriate to me for these > constants. What do you think? > > Would it be OK for you if I initialize them right in the static initializer > of IOUtil based on "os.name" or do you prefer to have native methods which > return the right constants? > > I have a small preference for sun.nio.ch.Net because these constants are > used with Net.poll. I just thought because poll is more file-descriptor oriented and not network specific. And the constants are also used for example in: src/macosx/classes/sun/nio/ch/KQueueArrayWrapper.java: src/solaris/classes/sun/nio/ch/sctp/Sctp* src/solaris/classes/sun/nio/ch/Port.java But actually I have no prefernece here so I can put them just as well to sun.nio.ch.Net > Would you be open to separating this one from the AIX > changes? The reason is that it isn't AIX specific, rather just an oversight > that hasn't been an issue because it doesn't impact other platforms. Sure, no problem. Although I still would like to push this to ppc-aix-port/stage-9 and ppc-aix-port/stage first because that's where we really need it. Anyway, the current plan is to merge ppc-aix-port/stage-9 into jdk9 mainline by the end of January and ppc-aix-port/stage into 8u-dev by the end of March (for 8u20). Would that be ok? > Using > os.name initially would be okay although we could change that over time. I've already written the native methods:) > > -Alan From Alan.Bateman at oracle.com Thu Jan 16 08:51:55 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Thu, 16 Jan 2014 16:51:55 +0000 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: <52D51FAC.8060800@oracle.com> <52D629A0.4080806@oracle.com> <52D64ED5.4020409@oracle.com> <52D7AEF8.8090502@oracle.com> Message-ID: <52D80E2B.5000608@oracle.com> On 16/01/2014 10:34, Volker Simonis wrote: > : > I just thought because poll is more file-descriptor oriented and not > network specific. And the constants are also used for example in: > > src/macosx/classes/sun/nio/ch/KQueueArrayWrapper.java: > src/solaris/classes/sun/nio/ch/sctp/Sctp* > src/solaris/classes/sun/nio/ch/Port.java > > But actually I have no prefernece here so I can put them just as well > to sun.nio.ch.Net It's not used for anything except sockets here (there aren't any selectable channels that aren't also network channels). So I think sun.nio.ch.Net is marginly cleaner here. > : > > Sure, no problem. Although I still would like to push this to > ppc-aix-port/stage-9 and ppc-aix-port/stage first because that's where > we really need it. Anyway, the current plan is to merge > ppc-aix-port/stage-9 into jdk9 mainline by the end of January and > ppc-aix-port/stage into 8u-dev by the end of March (for 8u20). Would > that be ok? > I see you've created a bug for this. I guess it's okay if comes via the ppc port although would still be good to get it into jdk9/dev early as it impacts all platforms. -Alan. From vladimir.kozlov at oracle.com Thu Jan 16 10:15:39 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 16 Jan 2014 10:15:39 -0800 Subject: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization In-Reply-To: <52D76E6F.8070504@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE707E1@DEWDFEMB12A.global.corp.sap> <52B3A3AF.9050609@oracle.com> <52D76E6F.8070504@oracle.com> Message-ID: <52D821CB.6020207@oracle.com> Changes are in C++ Interpreter so it does not affect Oracle VM. But David has point here. I would like to hear the explanation too. BTW, I see that for ppc64: src/cpu/ppc/vm//globals_ppc.hpp:define_pd_global(bool, UseMembar, false); as result write_memory_serialize_page() is used in ThreadStateTransition::transition(). Is it not enough on PPC64? Thanks, Vladimir On 1/15/14 9:30 PM, David Holmes wrote: > Can I get some response on this please - specifically the redundancy wrt > IRT_ENTRY actions. > > Thanks, > David > > On 20/12/2013 11:55 AM, David Holmes wrote: >> Still catching up ... >> >> On 11/12/2013 9:46 PM, Lindenmaier, Goetz wrote: >>> Hi, >>> >>> this change adds StoreStore barriers after object initialization and >>> after constructor calls in the C++ interpreter. This assures no >>> uninitialized >>> objects or final fields are visible. >>> http://cr.openjdk.java.net/~goetz/webrevs/8029957-0-moci/ >> >> The InterpreterRuntime calls are all IRT_ENTRY points which will utilize >> thread state transitions that already include a full "fence" so the >> storestore barriers are redundant in those cases. >> >> The fastpath _new storestore seems okay. >> >> I don't know how handle_return gets used to know if it is reasonable or >> not. >> >> I was trying, unsuccessfully, to examine the same code in the >> templateInterpreter to see how it handles these cases as it naturally >> has the same object-initialization-safety requirements (though these can >> be handled in a number of different ways other than an unconditional >> storestore barrier at the end of the initialization and construction >> phases. >> >> David >> ----- >> >>> Please review and test this change. >>> >>> Best regards, >>> Goetz. >>> From philip.race at oracle.com Thu Jan 16 10:53:09 2014 From: philip.race at oracle.com (Phil Race) Date: Thu, 16 Jan 2014 10:53:09 -0800 Subject: [OpenJDK 2D-Dev] RFR(S): JDK-8031134 : PPC64: implement printing on AIX In-Reply-To: References: Message-ID: <52D82A95.7030803@oracle.com> Hello Volker, Interesting that all this is needed. How has AIX got by before ? Is this taken from an existing IBM port or did you write this yourself ? I'd hope you are getting help directly from IBM in this area. I suppose that if CUPS is configured and running it'll take precedence over these as it does for the other cases. Someone with AIX should test that at some point as its the only way to know for sure that that works. I'd hoped that AIX would be able to fit into either the BSD or SysV printing mold but it seems like its got its own special commands - lsallq is a new one on me and looks like an AIX special - the "-W" arg to lpstat is completely different than what it means on OS X or Linux ! and there's the oddity that it doesn't expect spaces between the flag and the value .. so I think your approach is the best one. A few specific comments :- } else if (aixPrinterEnumerator.equalsIgnoreCase("lsallq")) { 144 aix_defaultPrinterEnumeration = aix_lsallq; 145 } this code seems redundant since its just reasserting the default unless you intend that this be something that can change multiple times but I'd advise against that and anyway its in static block so that won't happen .. I see you defined 167 static boolean isAIX( ) { 168 return osname.equals("AIX"); 169 } 170 so why are you using this here :- 136 if (osname.equals("AIX")) { instead of calling isAIX() as you do elsewhere ? ... } else if (names.length != 1) { // No default printer found In the other cases we chose to nominate the 1st printer as the default. This seemed a better choice than making apps deal with a list of installed printers but no default. Not sure what problems you might encounter with this (if any). If the situation never occurs its not an issue. -phil. On 1/16/14 12:08 AM, Volker Simonis wrote: > Resending one more time to 2d-dev upon request: > > Hi, > > could somebody please review the following small change: > > http://cr.openjdk.java.net/~simonis/webrevs/8031134/ > > It's the straight forward implementation of the basic printing > infrastructure on AIX and shouldn't have any impact on the existing > platforms. As always, this change is intended for the > http://hg.openjdk.java.net/ppc-aix-port/stage/jdk repository. > > Thank you and best regards, > Volker From david.holmes at oracle.com Thu Jan 16 18:21:11 2014 From: david.holmes at oracle.com (David Holmes) Date: Fri, 17 Jan 2014 12:21:11 +1000 Subject: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization In-Reply-To: <52D821CB.6020207@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE707E1@DEWDFEMB12A.global.corp.sap> <52B3A3AF.9050609@oracle.com> <52D76E6F.8070504@oracle.com> <52D821CB.6020207@oracle.com> Message-ID: <52D89397.2030903@oracle.com> On 17/01/2014 4:15 AM, Vladimir Kozlov wrote: > Changes are in C++ Interpreter so it does not affect Oracle VM. > But David has point here. I would like to hear the explanation too. > > BTW, I see that for ppc64: > > src/cpu/ppc/vm//globals_ppc.hpp:define_pd_global(bool, UseMembar, false); > > as result write_memory_serialize_page() is used in > ThreadStateTransition::transition(). > > Is it not enough on PPC64? You would need to consult a PPC64+linux expert to be certain but what I was basically told is that the serialization page mechanism is only guaranteed to work for TSO systems. David ----- > Thanks, > Vladimir > > On 1/15/14 9:30 PM, David Holmes wrote: >> Can I get some response on this please - specifically the redundancy wrt >> IRT_ENTRY actions. >> >> Thanks, >> David >> >> On 20/12/2013 11:55 AM, David Holmes wrote: >>> Still catching up ... >>> >>> On 11/12/2013 9:46 PM, Lindenmaier, Goetz wrote: >>>> Hi, >>>> >>>> this change adds StoreStore barriers after object initialization and >>>> after constructor calls in the C++ interpreter. This assures no >>>> uninitialized >>>> objects or final fields are visible. >>>> http://cr.openjdk.java.net/~goetz/webrevs/8029957-0-moci/ >>> >>> The InterpreterRuntime calls are all IRT_ENTRY points which will utilize >>> thread state transitions that already include a full "fence" so the >>> storestore barriers are redundant in those cases. >>> >>> The fastpath _new storestore seems okay. >>> >>> I don't know how handle_return gets used to know if it is reasonable or >>> not. >>> >>> I was trying, unsuccessfully, to examine the same code in the >>> templateInterpreter to see how it handles these cases as it naturally >>> has the same object-initialization-safety requirements (though these can >>> be handled in a number of different ways other than an unconditional >>> storestore barrier at the end of the initialization and construction >>> phases. >>> >>> David >>> ----- >>> >>>> Please review and test this change. >>>> >>>> Best regards, >>>> Goetz. >>>> From goetz.lindenmaier at sap.com Fri Jan 17 00:39:05 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 17 Jan 2014 08:39:05 +0000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <52D7A0A9.6070208@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293F087.2080700@oracle.com> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> <52D78697.2090408@oracle.com> <52D79982.4060100@oracle.com> <52D79E61.1060801@oracle.com> <52D7A0A9.6070208@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8CF70@DEWDFEMB12A.global.corp.sap> Hi, I tried to come up with a webrev that implements the change as proposed in your mails: http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use support_IRIW_for_not_multiple_copy_atomic_cpu. I left the definition and handling of _wrote_volatile in the code, without any protection. I protected issuing the barrier for volatile in constructors with PPC64_ONLY() , and put it on one line. I removed the comment in library_call.cpp. I also removed the sentence " Solution: implement volatile read as sync-load-acquire." from the comments as it's PPC specific. Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these issues in C1 if nobody did it by then. Wrt. to performance: Oracle will soon do heavy testing of the port. If any performance problems arise, we still can add #ifdef PPC64 to circumvent this. Best regards, Goetz. -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Donnerstag, 16. Januar 2014 10:05 To: Vladimir Kozlov Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes On 16/01/2014 6:54 PM, Vladimir Kozlov wrote: > On 1/16/14 12:34 AM, David Holmes wrote: >> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote: >>> This is becoming ugly #ifdef mess. In compiler code we are trying to >>> avoid them. I suggested to have _wrote_volatile without #ifdef and I >>> want to keep it this way, it could be useful to have such info on other >>> platforms too. But I would suggest to remove PPC64 comments in >>> parse.hpp. >>> >>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value >>> which could be checked in all places instead of #ifdef: >> >> I asked for the ifdef some time back as I find it much preferable to >> have this as a build-time construct rather than a >> runtime one. I don't want to have to pay anything for this if we don't >> use it. > > Any decent C++ compiler will optimize expressions with such constants > defined in header files. I insist to avoid #ifdefs in C2 code. I really > don't like the code with #ifdef in unsafe.cpp but I can live with it. If you insist then we may as well do it all the same way. Better to be consistent. My apologies Goetz for wasting your time going back and forth on this. That aside I have a further concern with this IRIW support - it is incomplete as there is no C1 support, as PPC64 isn't using client. If this is going on then we (which probably means the Oracle 'we') need to add the missing C1 code. David ----- > Vladimir > >> >> David >> >>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC >>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; >>> #else >>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; >>> #endif >>> >>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever >>> >>> and then: >>> >>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ >>> oop p = JNIHandles::resolve(obj); \ >>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) >>> OrderAccess::fence(); \ >>> volatile type_name v = OrderAccess::load_acquire((volatile >>> type_name*)index_oop_from_field_offset_long(p, offset)); >>> >>> And: >>> >>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu && >>> field->is_volatile()) { >>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier >>> + } >>> >>> And so on. The comments will be needed only in globalDefinitions.hpp >>> >>> The code in parse1.cpp could be put on one line: >>> >>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && >>> method()->is_initializer()) )) { >>> >>> Thanks, >>> Vladimir >>> >>> On 1/15/14 9:25 PM, David Holmes wrote: >>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >>>>> Hi David, >>>>> >>>>> I updated the webrev: >>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>> >>>>> - I removed the IRIW example in parse3.cpp >>>>> - I adapted the comments not to point to that comment, and to >>>>> reflect the new flagging. Also I mention that we support the >>>>> volatile constructor issue, but that it's not standard. >>>>> - I protected issuing the barrier for the constructor by PPC64. >>>>> I also think it's better to separate these this way. >>>> >>>> Sorry if I wasn't clear but I'd like the wrote_volatile field >>>> declaration and all uses to be guarded by ifdef PPC64 too >>>> please. >>>> >>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this >>>> comment doesn't make much sense to me and refers to >>>> ppc specific stuff in a shared file: >>>> >>>> if (is_volatile) { >>>> ! if (!is_store) { >>>> insert_mem_bar(Op_MemBarAcquire); >>>> ! } else { >>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. >>>> insert_mem_bar(Op_MemBarVolatile); >>>> + #endif >>>> + } >>>> >>>> I don't think the comment is needed. >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks for your comments! >>>>> >>>>> Best regards, >>>>> Goetz. >>>>> >>>>> -----Original Message----- >>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>> Sent: Mittwoch, 15. Januar 2014 01:55 >>>>> To: Lindenmaier, Goetz >>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; >>>>> 'hotspot-dev at openjdk.java.net' >>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>> Independent Reads of Independent Writes >>>>> >>>>> Hi Goetz, >>>>> >>>>> Sorry for the delay in getting back to this. >>>>> >>>>> The general changes to the volatile barriers to support IRIW are okay. >>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >>>>> specifically it is >>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >>>>> the commentary excessive, particularly for shared code. In particular >>>>> the IRIW example in parse3.cpp - it seems a strange place to give the >>>>> explanation and I don't think we need it to that level of detail. >>>>> Seems >>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >>>>> >>>>> The changes related to volatile writes in the constructor, as >>>>> discussed >>>>> are not required by the Java Memory Model. If you want to keep these >>>>> then I think they should all be guarded with PPC64 because it is not >>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >>>>> PPC64 porters. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>>>>> Hi, >>>>>> >>>>>> I updated this webrev. I detected a small flaw I made when editing >>>>>> this version. >>>>>> The #endif in line 322, parse3.cpp was in the wrong line. >>>>>> I also based the webrev on the latest version of the stage repo. >>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>> >>>>>> Best regards, >>>>>> Goetz. >>>>>> >>>>>> -----Original Message----- >>>>>> From: Lindenmaier, Goetz >>>>>> Sent: Freitag, 20. Dezember 2013 13:47 >>>>>> To: David Holmes >>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>> Independent Reads of Independent Writes >>>>>> >>>>>> Hi David, >>>>>> >>>>>>> So we can at least undo #4 now we have established those tests were >>>>>>> not >>>>>>> required to pass. >>>>>> We would prefer if we could keep this in. We want to avoid that it's >>>>>> blamed on the VM if java programs are failing on PPC after they >>>>>> worked >>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard >>>>>> it by >>>>>> a flag as proposed. But if you insist I will remove it. Also, this >>>>>> part is >>>>>> not that performance relevant. >>>>>> >>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>> think >>>>>> I added a compile-time guard in this new webrev: >>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>>>>> several double negations I don't like, (#ifNdef >>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC) >>>>>> but this way I only have to change the ppc platform. >>>>>> >>>>>> Best regards, >>>>>> Goetz >>>>>> >>>>>> P.S.: I will also be available over the Christmas period. >>>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>> Sent: Freitag, 20. Dezember 2013 05:58 >>>>>> To: Lindenmaier, Goetz >>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>> Independent Reads of Independent Writes >>>>>> >>>>>> Sorry for the delay, it takes a while to catch up after two weeks >>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check >>>>>> emails. >>>>>> >>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>>>>> Hi, >>>>>>> >>>>>>> ok, I understand the tests are wrong. It's good this issue is >>>>>>> settled. >>>>>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>>>>> >>>>>>> About our change: David, the causality is the other way round. >>>>>>> The change is about IRIW. >>>>>>> 1. To pass IRIW, we must use sync instructions before loads. >>>>>> >>>>>> This is the part I still have some question marks over as the >>>>>> implications are not nice for performance on non-TSO platforms. >>>>>> But I'm >>>>>> no further along in processing that paper I'm afraid. >>>>>> >>>>>>> 2. If we do syncs before loads, we don't need to do them after >>>>>>> stores. >>>>>>> 3. If we don't do them after stores, we fail the volatile >>>>>>> constructor tests. >>>>>>> 4. So finally we added them again at the end of the constructor >>>>>>> after stores >>>>>>> to pass the volatile constructor tests. >>>>>> >>>>>> So we can at least undo #4 now we have established those tests >>>>>> were not >>>>>> required to pass. >>>>>> >>>>>>> We originally passed the constructor tests because the ppc memory >>>>>>> order >>>>>>> instructions are not as find-granular as the >>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad. >>>>>>> The only instruction >>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, >>>>>>> therefore the >>>>>>> MemBarVolatile after the store fixes the constructor tests. The >>>>>>> proper representation >>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now >>>>>>> it's pointless >>>>>>> anyways. >>>>>>> >>>>>>>> I'm not happy with the ifdef approach but I won't block it. >>>>>>> I'd be happy to add a property >>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>> >>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>> think >>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>>>>> based not architecture based) as that will allows for turning this >>>>>> on/off for any architecture for testing purposes. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>>> or the like to guard the customization. I'd like that much better. >>>>>>> Or also >>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>> >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>> Sent: Donnerstag, 28. November 2013 00:34 >>>>>>> To: Lindenmaier, Goetz >>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>> Independent Reads of Independent Writes >>>>>>> >>>>>>> TL;DR version: >>>>>>> >>>>>>> Discussion on the c-i list has now confirmed that a >>>>>>> constructor-barrier >>>>>>> for volatiles is not required as part of the JMM specification. It >>>>>>> *may* >>>>>>> be required in an implementation that doesn't pre-zero memory to >>>>>>> ensure >>>>>>> you can't see uninitialized fields. So the tests for this are >>>>>>> invalid >>>>>>> and this part of the patch is not needed in general (ppc64 may >>>>>>> need it >>>>>>> due to other factors). >>>>>>> >>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the >>>>>>> term :) >>>>>>> Second thanks for the reference to that paper! For reference: >>>>>>> >>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>>>>> complex interconnect) does not guarantee that a write becomes >>>>>>> visible to >>>>>>> all other hardware threads at the same time point; these >>>>>>> architectures >>>>>>> are not multiple-copy atomic." >>>>>>> >>>>>>> This is the visibility issue that I referred to and affects both >>>>>>> ARM and >>>>>>> PPC. But of course it is normally handled by using suitable barriers >>>>>>> after the stores that need to be visible. I think the crux of the >>>>>>> current issue is what you wrote below: >>>>>>> >>>>>>> > The fixes for the constructor issue are only needed because we >>>>>>> > remove the sync instruction from behind stores >>>>>>> (parse3.cpp:320) >>>>>>> > and place it before loads. >>>>>>> >>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync >>>>>>> after >>>>>>> the store then you have to do something around the loads to get the >>>>>>> same >>>>>>> results! I still don't know what lead you to the conclusion that the >>>>>>> only way to fix the IRIW issue was to put the fence before the >>>>>>> load - >>>>>>> maybe when I get the chance to read that paper in full it will be >>>>>>> clearer. >>>>>>> >>>>>>> So ... the basic problem is that the current structure in the VM has >>>>>>> hard-wired one choice of how to get the right semantics for volatile >>>>>>> variables. You now want to customize that but not all the requisite >>>>>>> hooks are present. It would be better if volatile_load and >>>>>>> volatile_store were factored out so that they could be >>>>>>> implemented as >>>>>>> desired per-platform. Alternatively there could be pre- and post- >>>>>>> hooks >>>>>>> that could then be customized per platform. Otherwise you need >>>>>>> platform-specific ifdef's to handle it as per your patch. >>>>>>> >>>>>>> I'm not happy with the ifdef approach but I won't block it. I think >>>>>>> this >>>>>>> is an area where a lot of clean up is needed in the VM. The barrier >>>>>>> abstractions are a confused mess in my opinion. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>> >>>>>>>> I did not yet add the >>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>> or >>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>> to reduce #defined, as I got no further comment on that. >>>>>>>> >>>>>>>> >>>>>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>>>>> I feel not in the position to contribute substantially. >>>>>>>> >>>>>>>> But we would like to pass the torture test suite as we consider >>>>>>>> this a substantial task in implementing a PPC port. Also we think >>>>>>>> both tests show behavior a programmer would expect. It's bad if >>>>>>>> Java code runs fine on the more common x86 platform, and then >>>>>>>> fails on ppc. This will always first be blamed on the VM. >>>>>>>> >>>>>>>> The fixes for the constructor issue are only needed because we >>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>>>>> and place it before loads. Then there is no sync between volatile >>>>>>>> store >>>>>>>> and publishing the object. So we add it again in this one case >>>>>>>> (volatile store in constructor). >>>>>>>> >>>>>>>> >>>>>>>> @David >>>>>>>>>> Sure. There also is no solution as you require for the >>>>>>>>>> taskqueue problem yet, >>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>> continuous. >>>>>>>> That's not true, we did a lot of investigation and testing on this >>>>>>>> issue. >>>>>>>> And we came up with a solution we consider the best possible. If >>>>>>>> you >>>>>>>> have objections, you should at least give the draft of a better >>>>>>>> solution, >>>>>>>> we would volunteer to implement and test it. >>>>>>>> Similarly, we invested time in fixing the concurrency torture >>>>>>>> issues. >>>>>>>> >>>>>>>> @David >>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term >>>>>>>>> and >>>>>>>>> can't find any reference to it. >>>>>>>> We learned about this reading "A Tutorial Introduction to the >>>>>>>> ARM and >>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>>>>> Peter Sewell, which is cited in "Correct and Efficient >>>>>>>> Work-Stealing for >>>>>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the >>>>>>>> taskqueue problem. >>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>>>> >>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I >>>>>>>> used 'read' >>>>>>>> instead. Sorry for that. (I also fixed that in the method name >>>>>>>> above). >>>>>>>> >>>>>>>> Best regards and thanks for all your involvements, >>>>>>>> Goetz. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>>>>> To: Lindenmaier, Goetz >>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> Hi Goetz, >>>>>>>> >>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>> -- Volatile in constuctor >>>>>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>>>>> missing constructor barrier. >>>>>>>>> We see them on PPC64. Our test machines have typically 8-32 >>>>>>>>> processors >>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks >>>>>>>>> Aleksey!) >>>>>>>> >>>>>>>> And see follow ups - the tests are invalid. >>>>>>>> >>>>>>>>> -- IRIW issue >>>>>>>>>> I can not possibly answer to the necessary level of detail with >>>>>>>>>> a few >>>>>>>>>> moments thought. >>>>>>>>> Sure. There also is no solution as you require for the taskqueue >>>>>>>>> problem yet, >>>>>>>>> and that's being discussed now for almost a year. >>>>>>>> >>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>> continuous. >>>>>>>> >>>>>>>>>> You are implying there is a problem here that will >>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>> different?) >>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore >>>>>>>>> I contributed a >>>>>>>>> solution with the #defines, and that's correct for all, but not >>>>>>>>> nice, I admit. >>>>>>>>> (I don't really know about ARM, though). >>>>>>>>> So if I can write down a nicer solution testing for methods that >>>>>>>>> are evaluated >>>>>>>>> by the C-compiler I'm happy. >>>>>>>>> >>>>>>>>> The problem is not that IRIW is not handled by the JMM, the >>>>>>>>> problem >>>>>>>>> is that >>>>>>>>> store >>>>>>>>> sync >>>>>>>>> does not assure multiple-read-atomicity, >>>>>>>>> only >>>>>>>>> sync >>>>>>>>> load >>>>>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>>>>> pass that test. >>>>>>>> >>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the >>>>>>>> term and >>>>>>>> can't find any reference to it. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>>> The JMM is fine. And >>>>>>>>> store >>>>>>>>> MemBarVolatile >>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions >>>>>>>>> that >>>>>>>>> do what is required. >>>>>>>>> >>>>>>>>> So if you are off soon, please let's come to a solution that >>>>>>>>> might be improvable in the way it's implemented, but that >>>>>>>>> allows us to implement a correct PPC64 port. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Goetz. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; >>>>>>>>> 'hotspot-dev at openjdk.java.net'; >>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> Hi Goetz, >>>>>>>>> >>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>>>>> Hi everybody, >>>>>>>>>> >>>>>>>>>> thanks a lot for the detailed reviews! >>>>>>>>>> I'll try to answer to all in one mail. >>>>>>>>>> >>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM >>>>>>>>>>> to occur before the reference is assigned; >>>>>>>>>> We don't think it's correct if we omit the barrier after >>>>>>>>>> initializing >>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey >>>>>>>>>> Shipilev >>>>>>>>>> and Doug Lea, and they agreed. >>>>>>>>>> Also, concurrency torture tests >>>>>>>>>> LongVolatileTest >>>>>>>>>> AtomicIntegerInitialValueTest >>>>>>>>>> will fail. >>>>>>>>>> (In addition, observing 0 instead of the inital value of a >>>>>>>>>> volatile field would be >>>>>>>>>> very counter-intuitive for Java programmers, especially in >>>>>>>>>> AtomicInteger.) >>>>>>>>> >>>>>>>>> The affects of unsafe publication are always surprising - >>>>>>>>> volatiles do >>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM >>>>>>>>> that >>>>>>>>> requires the constructor barrier - discussions with Doug and >>>>>>>>> Aleksey >>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>>>>> missing constructor barrier. >>>>>>>>> >>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely >>>>>>>>>>> heavyweight >>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We >>>>>>>>>> don't >>>>>>>>>> see a way to implement this cheaper. >>>>>>>>>> >>>>>>>>>>> - these algorithms should be expressed using the correct >>>>>>>>>>> OrderAccess operations >>>>>>>>>> Basically, I agree on this. But you also have to take into >>>>>>>>>> account >>>>>>>>>> that due to the different memory ordering instructions on >>>>>>>>>> different platforms >>>>>>>>>> just implementing something empty is not sufficient. >>>>>>>>>> An example: >>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>>>>> If these are consecutively in the code, sparc code looks like >>>>>>>>>> this: >>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore | >>>>>>>>>> Assembler::StoreStore) >>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>>>>> Just doing what is required. >>>>>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>>>>> fine grained operations: >>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore, >>>>>>>>>> StoreStore, LoadLoad >>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore, >>>>>>>>>> StoreStore, LoadLoad, StoreLoad >>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations >>>>>>>>>> are more (too) powerful, >>>>>>>>>> I need an additional optimization that removes the lwsync. I >>>>>>>>>> can not implement >>>>>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>>>>> >>>>>>>>>> Back to the IRIW problem. I think here we have a comparable >>>>>>>>>> issue. >>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the >>>>>>>>>> read >>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>>>>> >>>>>>>>>> I would propose to guard the code by >>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>>>>> Else, David, how would you propose to implement this platform >>>>>>>>>> independent? >>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>>>>> >>>>>>>>> I can not possibly answer to the necessary level of detail with a >>>>>>>>> few >>>>>>>>> moments thought. You are implying there is a problem here that >>>>>>>>> will >>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>> different?) and I can not take that on face value at the >>>>>>>>> moment. The >>>>>>>>> only reason I can see IRIW not being handled by the JMM >>>>>>>>> requirements for >>>>>>>>> volatile accesses is if there are global visibility issues that >>>>>>>>> are not >>>>>>>>> addressed - but even then I would expect heavy barriers at the >>>>>>>>> store >>>>>>>>> would deal with that, not at the load. (This situation reminds me >>>>>>>>> of the >>>>>>>>> need for read-barriers on Alpha architecture due to the use of >>>>>>>>> software >>>>>>>>> cache-coherency rather than hardware cache-coherency - but we >>>>>>>>> don't have >>>>>>>>> that on ppc!) >>>>>>>>> >>>>>>>>> Sorry - There is no quick resolution here and in a couple of days >>>>>>>>> I will >>>>>>>>> be heading out on vacation for two weeks. >>>>>>>>> >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Goetz. >>>>>>>>>> >>>>>>>>>> -- Other ports: >>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so >>>>>>>>>> it might >>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of >>>>>>>>>> them). >>>>>>>>>> >>>>>>>>>> -- MemBarStoreStore after initialization >>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I >>>>>>>>>> can >>>>>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>>>>> To: Vladimir Kozlov >>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; >>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>> >>>>>>>>>> Okay this is my second attempt at answering this in a reasonable >>>>>>>>>> way :) >>>>>>>>>> >>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>>>>> I have to ask David to do correctness evaluation. >>>>>>>>>> >>>>>>>>>> From what I understand what we see here is an attempt to >>>>>>>>>> fix an >>>>>>>>>> existing issue with the implementation of volatiles so that the >>>>>>>>>> IRIW >>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when >>>>>>>>>> doing the >>>>>>>>>> load. >>>>>>>>>> >>>>>>>>>> Now if this was purely handled in ppc64 source code then I >>>>>>>>>> would be >>>>>>>>>> happy to let them do whatever they like (surely this kills >>>>>>>>>> performance >>>>>>>>>> though!). But I do not agree with the changes to the shared code >>>>>>>>>> that >>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY >>>>>>>>>> this is >>>>>>>>>> polluting the shared code. My concern is similar to what I said >>>>>>>>>> with the >>>>>>>>>> taskQueue changes - these algorithms should be expressed using >>>>>>>>>> the >>>>>>>>>> correct OrderAccess operations to guarantee the desired >>>>>>>>>> properties >>>>>>>>>> independent of architecture. If such a "barrier" is not needed >>>>>>>>>> on a >>>>>>>>>> given architecture then the implementation in OrderAccess should >>>>>>>>>> reduce >>>>>>>>>> to a no-op. >>>>>>>>>> >>>>>>>>>> And as Vitaly points out the constructor barriers are not needed >>>>>>>>>> under >>>>>>>>>> the JMM. >>>>>>>>>> >>>>>>>>>>> I am fine with suggested changes because you did not change our >>>>>>>>>>> current >>>>>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>>>>> But may be it should be done using more general query which >>>>>>>>>>> is set >>>>>>>>>>> depending on platform: >>>>>>>>>>> >>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>> >>>>>>>>>>> or similar to what we use now: >>>>>>>>>>> >>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>> >>>>>>>>>> Every platform has to support IRIW this is simply part of the >>>>>>>>>> Java >>>>>>>>>> Memory Model, there should not be any need to call this out >>>>>>>>>> explicitly >>>>>>>>>> like this. >>>>>>>>>> >>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are >>>>>>>>>> there >>>>>>>>>> visibility issues beyond the ordering constraints that the JMM >>>>>>>>>> defines? >>>>>>>>>>> From what I understand our ppc port is also affected. >>>>>>>>>>> David? >>>>>>>>>> >>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> In library_call.cpp can you add {}? New comment should be >>>>>>>>>>> inside else {}. >>>>>>>>>>> >>>>>>>>>>> I think you should make _wrote_volatile field not ppc64 >>>>>>>>>>> specific which >>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need >>>>>>>>>>> PPC64_ONLY() >>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many >>>>>>>>>>> #ifdefs. >>>>>>>>>>> >>>>>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>>>>> >>>>>>>>>>> if (is_vol) { >>>>>>>>>>> // See comment in do_get_xxx(). >>>>>>>>>>> #ifndef PPC64 >>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>>>>> #else >>>>>>>>>>> if (is_field) { >>>>>>>>>>> // Add MemBarRelease for constructors which write >>>>>>>>>>> volatile field >>>>>>>>>>> (PPC64). >>>>>>>>>>> set_wrote_volatile(true); >>>>>>>>>>> } >>>>>>>>>>> #endif >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vladimir >>>>>>>>>>> >>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I preprared a webrev with fixes for PPC for the >>>>>>>>>>>> VolatileIRIWTest of >>>>>>>>>>>> the torture test suite: >>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>>> >>>>>>>>>>>> Example: >>>>>>>>>>>> volatile x=0, y=0 >>>>>>>>>>>> __________ __________ __________ __________ >>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>>>>> >>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>>>>> read(y) read(x) >>>>>>>>>>>> >>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This >>>>>>>>>>>> is only >>>>>>>>>>>> assured by the sync instruction and if it is executed in the >>>>>>>>>>>> threads >>>>>>>>>>>> doing the loads. Thus we implement volatile read as >>>>>>>>>>>> sync-load-acquire >>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>>>>>> >>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple >>>>>>>>>>>> SIGSEGVs >>>>>>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>>>>>> >>>>>>>>>>>> Further this change contains a fix that assures that volatile >>>>>>>>>>>> fields >>>>>>>>>>>> written in constructors are visible before the reference gets >>>>>>>>>>>> published. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us, >>>>>>>>>>>> seems too >>>>>>>>>>>> strong. >>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should >>>>>>>>>>>> suffice. >>>>>>>>>>>> What do you think? >>>>>>>>>>>> >>>>>>>>>>>> Please review and test this change. >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> Goetz. >>>>>>>>>>>> From Alan.Bateman at oracle.com Fri Jan 17 01:48:46 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 17 Jan 2014 09:48:46 +0000 Subject: RFR(S/L): 8028537: PPC64: Updated the JDK regression tests to run on AIX In-Reply-To: References: <52D58C55.6070004@oracle.com> Message-ID: <52D8FC7E.5080908@oracle.com> On 15/01/2014 16:42, Volker Simonis wrote: > Hi Alan, > > thanks for the suggestion. That's fine for me. I've copied the empty > SCTP stubs from the macosx to the aix directory as well and updated > the make file accordingly (in the patch for "8031581: PPC64: Addons > and fixes for AIX to pass the jdk regression tests"). > > Therefore, the changes to the three tests: > > test/com/sun/nio/sctp/SctpChannel/Util.java > test/com/sun/nio/sctp/SctpMultiChannel/Util.java > test/com/sun/nio/sctp/SctpServerChannel/Util.java > > can be considered obsolete. Thanks, I think this makes the most sense. I looked through the rest of this webrev and the update to the tests looks fine. One general comment is that for many of these shell tests (that survive the current effort to replace them) is that we could move the Unix handling into the match any case so that we don't have to list each of Linux, SunOS, Darwin, ... I think this came up when the OS X port was brought in but there wasn't any follow-up on it. I am not suggesting you do this here, it's just a comment as I see same change to so many tests. A minor comment on SBC.java is that it could just catch UnsupportedOperationException on L238, that would avoid needing to check os.name. A really minor comment on the updates to ProblemList.txt is that the JMX test should probably be in the jdk_jmx section (it's just a convention that we've been using, it doesn't of course really matter where tests are listed). -Alan From volker.simonis at gmail.com Fri Jan 17 03:25:24 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 17 Jan 2014 12:25:24 +0100 Subject: RFR(M): 8031997: PPC64: Make the various POLL constants system dependant Message-ID: Hi, could you please review the following change which makes the various POLL constants used in sun.nio.ch platform dependant: http://cr.openjdk.java.net/~simonis/webrevs/8031997/ These changes are currently targeted for the ppc-aix-port/stage-9 repository but it is planned to merge them soon into jdk9 and jdk8u-dev (for 8u20). Currently, various constants used for the poll/epoll/pollset system calls are defined multiple times as public static final short constants in various Java files: src/share/classes/sun/nio/ch/AbstractPollArrayWrapper.java src/solaris/classes/sun/nio/ch/Port.java Until now, this has not been a problem because on Linux, Solaris and MacOSX these constants have the same values. However on Windows and AIX they are different. While this hasn't been a problem on Windows either, because as far as I can see, we don't directly use WSAPoll() until now and the POLL constants are only used 'symbolically' on Windows, it became a real problem for the AIX port. To avoid a mapping of the Java constants to the native ones every time we go from Java to Native and back, this change replaces the currently used constants with a single instance of constants which is placed in src/share/classes/sun/nio/ch/Net.java and which are dynamically initialized from native methods with the correct, platfrom-specific values. So this change replaces every occurrence of POLL*, Port.POLL* or AbstractPollArrayWrapper.POLL* in Java code with the corresponding Net.POLL* constants. However, there's one exception to this rule: I haven't changed the POLL constants in src/solaris/classes/sun/nio/ch/DevPollArrayWrapper.java. That is not only because DevPollArrayWrapper is Solaris-specific and only used there, but mainly because it uses several, non-standard POLL constants which are only available and meaningful on Solaris and I didn't wanted to bloat the amount of generic constants defined in Net.java. I also didn't updated the constants under src/solaris/demo/jni/Poller because that's a Solaris-specific demo anyway. With Windows, I was a little confused in the beginning. I think until now we don't really use the WSAPoll() functionality there. That's probably because it is only available since Windows Vista / Windows Server 2008 (see http://msdn.microsoft.com/en-us/library/windows/desktop/ms741669%28v=vs.85%29.aspx). As far as I understand, the constants are only used "symbolically", to drive the underlying, select() based implementation. I've therefore decided to use the "true" POLL constants on Windows if they are available. Otherwise I'll use the hard-wired (Solaris-based) values. Another Windows peculiarity is the fact that POLLCONN is defined with a value different to all other constants while it equals to POLLOUT on the Unix platforms. I don't know if this is really necessary, but I kept this behaviour in my change. Notice that his change will allow us to directly used the WSAPoll() functionality on Windows in the future. I've compiled and smoke-tested the changes on Linux/x86_64/PPC64, Windows/x86_64, Solaris/SPARC, MacOS X and AIX. On all these platforms they pass all the java/nio JTREG tests in the same way like without this change. This means that on Linux/MacOS they pass all 261/256 tests, on Windows, they pass 258 tests while the following two tests fail: java/nio/channels/DatagramChannel/MulticastSendReceiveTests.java java/nio/channels/DatagramChannel/Promiscuous.java But as I wrote before, these two test also fail without my changes applied, so I'm confident that the failures aren't related to this change. On Solaris, 258 test pass and only the java/nio/charset/Charset/default.sh test fails which isn't related to these changes either. Thank you and best regards, Volker From Alan.Bateman at oracle.com Fri Jan 17 03:44:48 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 17 Jan 2014 11:44:48 +0000 Subject: RFR(M): 8031997: PPC64: Make the various POLL constants system dependant In-Reply-To: References: Message-ID: <52D917B0.9090407@oracle.com> On 17/01/2014 11:25, Volker Simonis wrote: > : > > I've compiled and smoke-tested the changes on Linux/x86_64/PPC64, > Windows/x86_64, Solaris/SPARC, MacOS X and AIX. On all these platforms > they pass all the java/nio JTREG tests in the same way like without > this change. This means that on Linux/MacOS they pass all 261/256 > tests, on Windows, they pass 258 tests while the following two tests > fail: > > java/nio/channels/DatagramChannel/MulticastSendReceiveTests.java > java/nio/channels/DatagramChannel/Promiscuous.java > > But as I wrote before, these two test also fail without my changes > applied, so I'm confident that the failures aren't related to this > change. Any chance that this is firewall configuration or VPN that might be cause packets to be dropped? These tests should otherwise pass consistently on all platforms. If you have output from Linux or Solaris or OS X that you could send then it might help to diagnose this. -Alan From goetz.lindenmaier at sap.com Fri Jan 17 05:30:23 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 17 Jan 2014 13:30:23 +0000 Subject: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization In-Reply-To: <52D821CB.6020207@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE707E1@DEWDFEMB12A.global.corp.sap> <52B3A3AF.9050609@oracle.com> <52D76E6F.8070504@oracle.com> <52D821CB.6020207@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8D010@DEWDFEMB12A.global.corp.sap> Hi, I had a look at the first part of this issue: Whether StoreStore is necessary in the interpreter. Let's for now assume the serialization page mechanism works on PPC. In the state transition leaving the VM state, which is executed in the destructor, ThreadStateTransition::transition() is called, which executes if (UseMembar) { OrderAccess::fence(); } else { os::write_memory_serialize_page(thread); } os:: write_memory_serialize_page() can not be considered a proper MemBar, as it only serializes if another thread poisoned the page. Thus it does not qualify to order the initialization and the publishing of the object. You are right, if UseMembar is true, the StoreStore in the interpreter is superfluous. We could guard the StoreStores in the interpreter by !UseMembar. But then again, one is to order the publishing of the thread states, the other to enforce some Java semantics. I don't know whether everybody who changes in one place is aware of both issues. But if you want to, I'll add a !UseMembar in the interpreter. Maybe it would be a good idea to document the double use in interfaceSupport.cpp, too. And maybe add an assertion of some kind. We're digging into the other issue currenty, whether the serialization page works on ppc. We understand your concerns and have no simple answer to it right now. At least, in our VM and in the port there are no known problems with the state transitions. Best regards, Goetz. -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Donnerstag, 16. Januar 2014 19:16 To: David Holmes; Lindenmaier, Goetz Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev Source Developers' Subject: Re: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization Changes are in C++ Interpreter so it does not affect Oracle VM. But David has point here. I would like to hear the explanation too. BTW, I see that for ppc64: src/cpu/ppc/vm//globals_ppc.hpp:define_pd_global(bool, UseMembar, false); as result write_memory_serialize_page() is used in ThreadStateTransition::transition(). Is it not enough on PPC64? Thanks, Vladimir On 1/15/14 9:30 PM, David Holmes wrote: > Can I get some response on this please - specifically the redundancy wrt > IRT_ENTRY actions. > > Thanks, > David > > On 20/12/2013 11:55 AM, David Holmes wrote: >> Still catching up ... >> >> On 11/12/2013 9:46 PM, Lindenmaier, Goetz wrote: >>> Hi, >>> >>> this change adds StoreStore barriers after object initialization and >>> after constructor calls in the C++ interpreter. This assures no >>> uninitialized >>> objects or final fields are visible. >>> http://cr.openjdk.java.net/~goetz/webrevs/8029957-0-moci/ >> >> The InterpreterRuntime calls are all IRT_ENTRY points which will utilize >> thread state transitions that already include a full "fence" so the >> storestore barriers are redundant in those cases. >> >> The fastpath _new storestore seems okay. >> >> I don't know how handle_return gets used to know if it is reasonable or >> not. >> >> I was trying, unsuccessfully, to examine the same code in the >> templateInterpreter to see how it handles these cases as it naturally >> has the same object-initialization-safety requirements (though these can >> be handled in a number of different ways other than an unconditional >> storestore barrier at the end of the initialization and construction >> phases. >> >> David >> ----- >> >>> Please review and test this change. >>> >>> Best regards, >>> Goetz. >>> From volker.simonis at gmail.com Fri Jan 17 05:45:26 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 17 Jan 2014 14:45:26 +0100 Subject: RFR(M): 8031997: PPC64: Make the various POLL constants system dependant In-Reply-To: <52D917B0.9090407@oracle.com> References: <52D917B0.9090407@oracle.com> Message-ID: On Fri, Jan 17, 2014 at 12:44 PM, Alan Bateman wrote: > On 17/01/2014 11:25, Volker Simonis wrote: >> >> : >> >> I've compiled and smoke-tested the changes on Linux/x86_64/PPC64, >> Windows/x86_64, Solaris/SPARC, MacOS X and AIX. On all these platforms >> they pass all the java/nio JTREG tests in the same way like without >> this change. This means that on Linux/MacOS they pass all 261/256 >> tests, on Windows, they pass 258 tests while the following two tests >> fail: >> >> java/nio/channels/DatagramChannel/MulticastSendReceiveTests.java >> java/nio/channels/DatagramChannel/Promiscuous.java >> >> But as I wrote before, these two test also fail without my changes >> applied, so I'm confident that the failures aren't related to this >> change. > > Any chance that this is firewall configuration or VPN that might be cause > packets to be dropped? These tests should otherwise pass consistently on all > platforms. If you have output from Linux or Solaris or OS X that you could > send then it might help to diagnose this. > Yes, you're right - it was because of a "VirtualBox Host-Only Network" network device which seems to fool the test. After I disabled it, all tests passed successfully! And what about the change itself :) > -Alan From volker.simonis at gmail.com Fri Jan 17 06:04:42 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 17 Jan 2014 15:04:42 +0100 Subject: RFR(M): 8031997: PPC64: Make the various POLL constants system dependant In-Reply-To: <52D92EBC.1040500@redhat.com> References: <52D92EBC.1040500@redhat.com> Message-ID: On Fri, Jan 17, 2014 at 2:23 PM, Florian Weimer wrote: > On 01/17/2014 12:25 PM, Volker Simonis wrote: >> >> To avoid a mapping of the Java constants to the native ones every time >> we go from Java to Native and back, this change replaces the currently >> used constants with a single instance of constants which is placed in >> src/share/classes/sun/nio/ch/Net.java and which are dynamically >> initialized from native methods with the correct, platfrom-specific >> values. > > > Previously, the constants where inlined at the class file level. Is there a > downside to not doing this? > Hi Florian, I already thought about this. From a performance perspective, I don't think it will have any impact considering that these constants will be used for doing native system calls anyway. It could be of course a problem on AIX (and only there) if third-party code had previously inlined these constants. But as these are internal classes (i.e. sun.nio.ch) which should be only used by the JDK itself, I hope there's not too such much code outside there. Regards, Volker > -- > Florian Weimer / Red Hat Product Security Team From fweimer at redhat.com Fri Jan 17 05:23:08 2014 From: fweimer at redhat.com (Florian Weimer) Date: Fri, 17 Jan 2014 14:23:08 +0100 Subject: RFR(M): 8031997: PPC64: Make the various POLL constants system dependant In-Reply-To: References: Message-ID: <52D92EBC.1040500@redhat.com> On 01/17/2014 12:25 PM, Volker Simonis wrote: > To avoid a mapping of the Java constants to the native ones every time > we go from Java to Native and back, this change replaces the currently > used constants with a single instance of constants which is placed in > src/share/classes/sun/nio/ch/Net.java and which are dynamically > initialized from native methods with the correct, platfrom-specific > values. Previously, the constants where inlined at the class file level. Is there a downside to not doing this? -- Florian Weimer / Red Hat Product Security Team From fweimer at redhat.com Fri Jan 17 06:14:12 2014 From: fweimer at redhat.com (Florian Weimer) Date: Fri, 17 Jan 2014 15:14:12 +0100 Subject: RFR(M): 8031997: PPC64: Make the various POLL constants system dependant In-Reply-To: References: <52D92EBC.1040500@redhat.com> Message-ID: <52D93AB4.50402@redhat.com> On 01/17/2014 03:04 PM, Volker Simonis wrote: > On Fri, Jan 17, 2014 at 2:23 PM, Florian Weimer wrote: >> On 01/17/2014 12:25 PM, Volker Simonis wrote: >>> >>> To avoid a mapping of the Java constants to the native ones every time >>> we go from Java to Native and back, this change replaces the currently >>> used constants with a single instance of constants which is placed in >>> src/share/classes/sun/nio/ch/Net.java and which are dynamically >>> initialized from native methods with the correct, platfrom-specific >>> values. >> >> >> Previously, the constants where inlined at the class file level. Is there a >> downside to not doing this? > I already thought about this. From a performance perspective, I don't > think it will have any impact considering that these constants will be > used for doing native system calls anyway. True, and the few extra bytes of class data hopefully don't matter (although they keep adding up). > It could be of course a problem on AIX (and only there) if third-party > code had previously inlined these constants. But as these are internal > classes (i.e. sun.nio.ch) which should be only used by the JDK itself, > I hope there's not too such much code outside there. Some of the classes weren't even public until 8, so this risk is even reduced further. -- Florian Weimer / Red Hat Product Security Team From Alan.Bateman at oracle.com Fri Jan 17 07:16:54 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 17 Jan 2014 15:16:54 +0000 Subject: RFR(M): 8031997: PPC64: Make the various POLL constants system dependant In-Reply-To: References: <52D917B0.9090407@oracle.com> Message-ID: <52D94966.8090407@oracle.com> On 17/01/2014 13:45, Volker Simonis wrote: > : > Yes, you're right - it was because of a "VirtualBox Host-Only Network" > network device which seems to fool the test. After I disabled it, all > tests passed successfully! > > And what about the change itself :) > The change itself looks mostly okay. For naming then I think I would have a slight preference for something like pollinValue to getNatvePollin so that it somewhat consistent with the other places where we do this (like in epoll code with eventSize, eventsOffset, ...). Naming is subjective of course so this isn't a big issue. I suspect you can drop POLLREMOVE, that is only used in the /dev/poll Selector and it has its own definition (and shouldn't be compiled on anything other than Solaris anyway). A minor comment on DatagramChannelImpl, SourceChannelImpl and a few more where the replacing PollArrayWrapper.POLL* with Net.POLL* means it is no longer necessary to split lines (just might be neater to bring these cases back on the one line again). I suspect we will be able to drop the changes to the Windows nio_util.h soon as these older versions of Windows are not long for this world. I assume that by taking on newer VC++ that it won't even be possible to build or run either. -Alan. From Alan.Bateman at oracle.com Fri Jan 17 07:50:27 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 17 Jan 2014 15:50:27 +0000 Subject: RFR(S/L): 8028537: PPC64: Updated the JDK regression tests to run on AIX In-Reply-To: References: Message-ID: <52D95143.6080601@oracle.com> On 14/01/2014 16:57, Volker Simonis wrote: > Hi, > > could you please review the following change: > > http://cr.openjdk.java.net/~simonis/webrevs/8028537/ > > which, together with the changes from "8031581: PPC64: Addons and fixes for > AIX to pass the jdk regression tests" and "8031134 : PPC64: implement > printing on AIX" enables our our port to pass all but the following 7 jtreg > regression tests on AIX (compared to the Linux/x86_64 baseline from > www.java.net/download/jdk8/testresults/testresults.html?): > I've finally got to this one. As the event translation issue is now a separate issue then I've ignored that part. I'm not comfortable with the changes to FileDispatcherImpl.c as I don't think we shouldn't be calling into IO_ or NET_* functions here. I think I get the issue that you have on AIX (and assume it's the preClose/dup2 that blocks rather than close) but need a bit of time to suggest alternatives. It may be that it will require an AIX specific SocketDispatcher. Do you happen to know which tests fail due to this part? The other changes look okay. There is a typo in the change to zip_util.c, s/legel/legal/. In DatagramChannelImpl.c then you handle connect failing with EAFNOSUPPORT. I would be tempted to replace the comment to say that it EAFNOSUPPORT can be ignored on AIX. A minor comment but the indentation for rv = errno can be fixed (I see the BSD code has it wrong too). -Alan. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140117/74d15d2f/attachment.html From volker.simonis at gmail.com Fri Jan 17 09:42:26 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 17 Jan 2014 18:42:26 +0100 Subject: RFR(M): 8031997: PPC64: Make the various POLL constants system dependant In-Reply-To: <52D94966.8090407@oracle.com> References: <52D917B0.9090407@oracle.com> <52D94966.8090407@oracle.com> Message-ID: On Fri, Jan 17, 2014 at 4:16 PM, Alan Bateman wrote: > On 17/01/2014 13:45, Volker Simonis wrote: >> >> : >> Yes, you're right - it was because of a "VirtualBox Host-Only Network" >> network device which seems to fool the test. After I disabled it, all >> tests passed successfully! >> >> And what about the change itself :) >> > The change itself looks mostly okay. > > For naming then I think I would have a slight preference for something like > pollinValue to getNatvePollin so that it somewhat consistent with the other > places where we do this (like in epoll code with eventSize, eventsOffset, > ...). Naming is subjective of course so this isn't a big issue. > Done. > I suspect you can drop POLLREMOVE, that is only used in the /dev/poll > Selector and it has its own definition (and shouldn't be compiled on > anything other than Solaris anyway). > Removed POLLREMOVE from sun.nio.ch.Net. > A minor comment on DatagramChannelImpl, SourceChannelImpl and a few more > where the replacing PollArrayWrapper.POLL* with Net.POLL* means it is no > longer necessary to split lines (just might be neater to bring these cases > back on the one line again). > Done. > I suspect we will be able to drop the changes to the Windows nio_util.h soon > as these older versions of Windows are not long for this world. I assume > that by taking on newer VC++ that it won't even be possible to build or run > either. > We still support Server 2003 :( > -Alan. Here's the new webrev: http://cr.openjdk.java.net/~simonis/webrevs/8031997_2/ Built and tested like before. Everything OK. Is this now ready for push into ppc-aix-port/stage-9? Regards, Volker PS: I've added you as reviewer, but unfortunately after I created the webrev. From volker.simonis at gmail.com Fri Jan 17 10:56:37 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 17 Jan 2014 19:56:37 +0100 Subject: [OpenJDK 2D-Dev] RFR(S): JDK-8031134 : PPC64: implement printing on AIX In-Reply-To: <52D82A95.7030803@oracle.com> References: <52D82A95.7030803@oracle.com> Message-ID: Hi Phil, first of all, thanks a lot for your review. Please find my answers and comments inline: On Thu, Jan 16, 2014 at 7:53 PM, Phil Race wrote: > Hello Volker, > > Interesting that all this is needed. How has AIX got by before ? > Well, there hasn't been AIX support in OpenJDK until now so it obviously didn't work:) > Is this taken from an existing IBM port or did you write this yourself ? > This is from our certified, long running (Java 4,5,6,7), commercial SAP JVM (proudly written by ourselves). > I'd hope you are getting help directly from IBM in this area. > Sometimes:) We're doing the OpenJDK port together with IBM. (That is, we're actually merging our existing, commercial ports in.) > I suppose that if CUPS is configured and running it'll take precedence > over these > as it does for the other cases. Someone with AIX should test that at some > point as > its the only way to know for sure that that works. > There is no such things like CUPS or fontconfig on AIX. Of course somebody may compile, configure and run it by himself. But that's neither the default setting on AIX nor is it supported by IBM. I have no such box and I don't know any customer that would use it. But believe me, our customers are using our implementation since long time - and it is working (even printing). I'd hoped that AIX would be able to fit into either the BSD or SysV > printing mold > but it seems like its got its own special commands > - lsallq is a new one on me and looks like an AIX special > Yes, it's AIX special (from the man page " The lsallq command lists the names of all configured queues contained in the /etc/qconfig file."). However, sometimes it can really save your life, especially if you have costumers which have configured thousands of printers and 'lpstat -p' takes hours to come back (we've seen that!). - the "-W" arg to lpstat is completely different than what it means on OS X > or Linux ! > and there's the oddity that it doesn't expect spaces between the flag and > the value .. > so I think your approach is the best one. > > A few specific comments :- > > } else if (aixPrinterEnumerator.equalsIgnoreCase("lsallq")) { > 144 aix_defaultPrinterEnumeration = aix_lsallq; > 145 } > > this code seems redundant since its just reasserting the default unless > you intend that this be something that can change multiple times but > I'd advise against that and anyway its in static block so that won't > happen .. > We've documented both values for "sun.java2d.print.aix.lpstat" to be able to change the default. So the user should be able to set both values. And no, the value won't be changed anywhere else. I see you defined > > 167 static boolean isAIX( ) { > 168 return osname.equals("AIX"); > 169 } > 170 > > so why are you using this here :- > 136 if (osname.equals("AIX")) { > > instead of calling isAIX() as you do elsewhere ? > Fixed. > ... > > } else if (names.length != 1) { > // No default printer found > > In the other cases we chose to nominate the 1st printer as the default. > This seemed a better choice than making apps deal with a list of > installed printers but no default. Not sure what problems you might > encounter with this (if any). If the situation never occurs its not an > issue. > You're right. This was probably a copy-and-past left over. I think there's no meaningful default printer name on AIX and if 'lpstat -d' doesn't return anything useful then there's probably a problem and we better return 'null'. Here's the new webrev: http://cr.openjdk.java.net/~simonis/webrevs/8031134_2/ Is this now ready for pushing into ppc-aix-port/stage-9? Thank you and best regards, Volker > -phil. > > > On 1/16/14 12:08 AM, Volker Simonis wrote: > >> Resending one more time to 2d-dev upon request: >> >> Hi, >> >> could somebody please review the following small change: >> >> http://cr.openjdk.java.net/~simonis/webrevs/8031134/ >> >> It's the straight forward implementation of the basic printing >> infrastructure on AIX and shouldn't have any impact on the existing >> platforms. As always, this change is intended for the >> http://hg.openjdk.java.net/ppc-aix-port/stage/jdk repository. >> >> Thank you and best regards, >> Volker >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140117/b8f379b3/attachment.html From volker.simonis at gmail.com Fri Jan 17 13:10:22 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 17 Jan 2014 22:10:22 +0100 Subject: RFR(S/L): 8028537: PPC64: Updated the JDK regression tests to run on AIX In-Reply-To: <52D8FC7E.5080908@oracle.com> References: <52D58C55.6070004@oracle.com> <52D8FC7E.5080908@oracle.com> Message-ID: Hi Alan, thanks for looking at this. Please find may comments inline: On Fri, Jan 17, 2014 at 10:48 AM, Alan Bateman wrote: > On 15/01/2014 16:42, Volker Simonis wrote: >> >> Hi Alan, >> >> thanks for the suggestion. That's fine for me. I've copied the empty SCTP >> stubs from the macosx to the aix directory as well and updated the make file >> accordingly (in the patch for "8031581: PPC64: Addons and fixes for AIX to >> pass the jdk regression tests"). >> >> Therefore, the changes to the three tests: >> >> test/com/sun/nio/sctp/SctpChannel/Util.java >> test/com/sun/nio/sctp/SctpMultiChannel/Util.java >> test/com/sun/nio/sctp/SctpServerChannel/Util.java >> >> can be considered obsolete. > > Thanks, I think this makes the most sense. > > I looked through the rest of this webrev and the update to the tests looks > fine. > Great, thanks. > One general comment is that for many of these shell tests (that survive the > current effort to replace them) is that we could move the Unix handling into > the match any case so that we don't have to list each of Linux, SunOS, > Darwin, ... I think this came up when the OS X port was brought in but > there wasn't any follow-up on it. I am not suggesting you do this here, it's > just a comment as I see same change to so many tests. > > A minor comment on SBC.java is that it could just catch > UnsupportedOperationException on L238, that would avoid needing to check > os.name. > I agree, that looks much nicer. Done as requested. > A really minor comment on the updates to ProblemList.txt is that the JMX > test should probably be in the jdk_jmx section (it's just a convention that > we've been using, it doesn't of course really matter where tests are > listed). Done. Moved the excluded tests down to the jdk_jmx section. Here's the new webrev: http://cr.openjdk.java.net/~simonis/webrevs/8028537_2/ Can I push this now to ppc-aix-port/stage-9? Thank you and best regards, Volker > > -Alan From volker.simonis at gmail.com Fri Jan 17 13:15:08 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 17 Jan 2014 22:15:08 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: <52D50118.3080000@oracle.com> References: <52D50118.3080000@oracle.com> Message-ID: On Tue, Jan 14, 2014 at 10:19 AM, Alan Bateman wrote: > On 14/01/2014 08:40, Volker Simonis wrote: >> >> Hi, >> >> could you please review the following changes for the ppc-aix-port >> stage/stage-9 repositories (the changes are planned for integration into >> ppc-aix-port/stage-9 and subsequent backporting to ppc-aix-port/stage): > > I'd like to review this but I won't have time until later in the week. From > an initial look then there are a few things are not pretty (the changes to > fix the AIX problems with I/O cancellation in particular) and I suspect that > some refactoring is going to be required to handle some of this cleanly. A > minor comment is that bug synopsis doesn't really communicate what these > changes are about. > > -Alan. Just forwarded the following message from another thread here where it belongs: On 17/01/2014 16:57, Alan Bateman wrote: I've finally got to this one. As the event translation issue is now a separate issue then I've ignored that part. I'm not comfortable with the changes to FileDispatcherImpl.c as I don't think we shouldn't be calling into IO_ or NET_* functions here. I think I get the issue that you have on AIX (and assume it's the preClose/dup2 that blocks rather than close) but need a bit of time to suggest alternatives. It may be that it will require an AIX specific SocketDispatcher. Do you happen to know which tests fail due to this part? The other changes look okay. There is a typo in the change to zip_util.c, s/legel/legal/. In DatagramChannelImpl.c then you handle connect failing with EAFNOSUPPORT. I would be tempted to replace the comment to say that it EAFNOSUPPORT can be ignored on AIX. A minor comment but the indentation for rv = errno can be fixed (I see the BSD code has it wrong too). From Alan.Bateman at oracle.com Fri Jan 17 13:21:26 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 17 Jan 2014 21:21:26 +0000 Subject: RFR(M): 8031997: PPC64: Make the various POLL constants system dependant In-Reply-To: References: <52D917B0.9090407@oracle.com> <52D94966.8090407@oracle.com> Message-ID: <52D99ED6.1070806@oracle.com> On 17/01/2014 17:42, Volker Simonis wrote: > : > Here's the new webrev: > > http://cr.openjdk.java.net/~simonis/webrevs/8031997_2/ > > Built and tested like before. Everything OK. > > Is this now ready for push into ppc-aix-port/stage-9? > Thanks the updates, this looks good to me. -Alan. From Alan.Bateman at oracle.com Fri Jan 17 13:24:56 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 17 Jan 2014 21:24:56 +0000 Subject: RFR(S/L): 8028537: PPC64: Updated the JDK regression tests to run on AIX In-Reply-To: References: <52D58C55.6070004@oracle.com> <52D8FC7E.5080908@oracle.com> Message-ID: <52D99FA8.1060108@oracle.com> On 17/01/2014 21:10, Volker Simonis wrote: > : > Done. Moved the excluded tests down to the jdk_jmx section. > > Here's the new webrev: > > http://cr.openjdk.java.net/~simonis/webrevs/8028537_2/ > > Can I push this now to ppc-aix-port/stage-9? > The SBC change looks good. I can't see the move of the excluded test but it's not important anyway. So I think this one is good to go. -Alan. From philip.race at oracle.com Fri Jan 17 15:21:10 2014 From: philip.race at oracle.com (Phil Race) Date: Fri, 17 Jan 2014 15:21:10 -0800 Subject: [OpenJDK 2D-Dev] RFR(S): JDK-8031134 : PPC64: implement printing on AIX In-Reply-To: References: <52D82A95.7030803@oracle.com> Message-ID: <52D9BAE6.8000108@oracle.com> Hi, On 1/17/14 10:56 AM, Volker Simonis wrote: > Hi Phil, > > first of all, thanks a lot for your review. Please find my answers and > comments inline: > > On Thu, Jan 16, 2014 at 7:53 PM, Phil Race > wrote: > > Hello Volker, > > Interesting that all this is needed. How has AIX got by before ? > > > Well, there hasn't been AIX support in OpenJDK until now so it > obviously didn't work:) I meant the IBM JDK. > > Is this taken from an existing IBM port or did you write this > yourself ? > > > This is from our certified, long running (Java 4,5,6,7), commercial > SAP JVM (proudly written by ourselves). Got it. Still if IBM have something else in this area they should speak up ! > > I'd hope you are getting help directly from IBM in this area. > > > Sometimes:) We're doing the OpenJDK port together with IBM. (That is, > we're actually merging our existing, commercial ports in.) > > I suppose that if CUPS is configured and running it'll take > precedence over these > as it does for the other cases. Someone with AIX should test that > at some point as > its the only way to know for sure that that works. > > > There is no such things like CUPS or fontconfig on AIX. Of course > somebody may compile, configure and run it by himself. But that's > neither the default setting on AIX nor is it supported by IBM. I have > no such box and I don't know any customer that would use it. Hmm google 'cups aix' suggests it does exist and get used but it may not be supported .. > > But believe me, our customers are using our implementation since long > time - and it is working (even printing). > > I'd hoped that AIX would be able to fit into either the BSD or > SysV printing mold > but it seems like its got its own special commands > - lsallq is a new one on me and looks like an AIX special > > > Yes, it's AIX special (from the man page " The lsallq command lists > the names of all configured queues contained in the /etc/qconfig > file."). However, sometimes it can really save your life, especially > if you have costumers which have configured thousands of printers and > 'lpstat -p' takes hours to come back (we've seen that!). > > - the "-W" arg to lpstat is completely different than what it > means on OS X or Linux ! > and there's the oddity that it doesn't expect spaces between the > flag and the value .. > so I think your approach is the best one. > > A few specific comments :- > > } else if (aixPrinterEnumerator.equalsIgnoreCase("lsallq")) { > 144 aix_defaultPrinterEnumeration = aix_lsallq; > 145 } > > this code seems redundant since its just reasserting the default > unless > you intend that this be something that can change multiple times but > I'd advise against that and anyway its in static block so that > won't happen .. > > > We've documented both values for "sun.java2d.print.aix.lpstat" to be > able to change the default. So the user should be able to set both > values. And no, the value won't be changed anywhere else. I wasn't saying they can't set both, just that the code to re-assert the hard coded default isn't adding anything so I personally would not add the byte code but its up to you. You can push as is if you like. Your hard coded default is 20 or so lines further up : - private static int aix_defaultPrinterEnumeration = aix_lsallq; > > I see you defined > > 167 static boolean isAIX( ) { > 168 return osname.equals("AIX"); > 169 } > 170 > > so why are you using this here :- > 136 if (osname.equals("AIX")) { > > instead of calling isAIX() as you do elsewhere ? > > > Fixed. > > ... > > } else if (names.length != 1) { > // No default printer found > > In the other cases we chose to nominate the 1st printer as the > default. > This seemed a better choice than making apps deal with a list of > installed printers but no default. Not sure what problems you might > encounter with this (if any). If the situation never occurs its > not an issue. > > > You're right. This was probably a copy-and-past left over. > I think there's no meaningful default printer name on AIX and if > 'lpstat -d' doesn't return anything useful then there's probably a > problem and we better return 'null'. > > Here's the new webrev: > > http://cr.openjdk.java.net/~simonis/webrevs/8031134_2/ > > > Is this now ready for pushing into ppc-aix-port/stage-9? Yes -phil. > > Thank you and best regards, > Volker > > -phil. > > > On 1/16/14 12:08 AM, Volker Simonis wrote: > > Resending one more time to 2d-dev upon request: > > Hi, > > could somebody please review the following small change: > > http://cr.openjdk.java.net/~simonis/webrevs/8031134/ > > > It's the straight forward implementation of the basic printing > infrastructure on AIX and shouldn't have any impact on the > existing > platforms. As always, this change is intended for the > http://hg.openjdk.java.net/ppc-aix-port/stage/jdk repository. > > Thank you and best regards, > Volker > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140117/611ddec0/attachment-0001.html From vladimir.kozlov at oracle.com Fri Jan 17 17:05:35 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 17 Jan 2014 17:05:35 -0800 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE8CF70@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> <52D78697.2090408@oracle.com> <52D79982.4060100@oracle.com> <52D79E61.1060801@oracle.com> <52D7A0A9.6070208@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8CF70@DEWDFEMB12A.global.corp.sap> Message-ID: <52D9D35F.2050501@oracle.com> Goetz, I asked to remove both comments in parse.hpp, comments in parse1.cpp and parse3.cpp is enough. It should be similar to _wrote_final: bool _wrote_volatile; // Did we write a final field? I think the next comment in parse3.cpp should be modified: + // But remember we wrote a volatile field so that a barrier can be issued + // in constructors. See do_exits() in parse1.cpp. // Remember we wrote a volatile field. // For not multiple copy atomic cpu (ppc64) a barrier should be issued // in constructors which have such stores. See do_exits() in parse1.cpp. Thanks, Vladimir On 1/17/14 12:39 AM, Lindenmaier, Goetz wrote: > Hi, > > I tried to come up with a webrev that implements the change as proposed in > your mails: > http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ > > Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use > support_IRIW_for_not_multiple_copy_atomic_cpu. > > I left the definition and handling of _wrote_volatile in the code, without > any protection. > I protected issuing the barrier for volatile in constructors with PPC64_ONLY() , > and put it on one line. > > I removed the comment in library_call.cpp. > I also removed the sentence " Solution: implement volatile read as sync-load-acquire." > from the comments as it's PPC specific. > > Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these > issues in C1 if nobody did it by then. > > Wrt. to performance: Oracle will soon do heavy testing of the port. If any > performance problems arise, we still can add #ifdef PPC64 to circumvent this. > > Best regards, > Goetz. > > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Donnerstag, 16. Januar 2014 10:05 > To: Vladimir Kozlov > Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > On 16/01/2014 6:54 PM, Vladimir Kozlov wrote: >> On 1/16/14 12:34 AM, David Holmes wrote: >>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote: >>>> This is becoming ugly #ifdef mess. In compiler code we are trying to >>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I >>>> want to keep it this way, it could be useful to have such info on other >>>> platforms too. But I would suggest to remove PPC64 comments in >>>> parse.hpp. >>>> >>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value >>>> which could be checked in all places instead of #ifdef: >>> >>> I asked for the ifdef some time back as I find it much preferable to >>> have this as a build-time construct rather than a >>> runtime one. I don't want to have to pay anything for this if we don't >>> use it. >> >> Any decent C++ compiler will optimize expressions with such constants >> defined in header files. I insist to avoid #ifdefs in C2 code. I really >> don't like the code with #ifdef in unsafe.cpp but I can live with it. > > If you insist then we may as well do it all the same way. Better to be > consistent. > > My apologies Goetz for wasting your time going back and forth on this. > > That aside I have a further concern with this IRIW support - it is > incomplete as there is no C1 support, as PPC64 isn't using client. If > this is going on then we (which probably means the Oracle 'we') need to > add the missing C1 code. > > David > ----- > >> Vladimir >> >>> >>> David >>> >>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; >>>> #else >>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; >>>> #endif >>>> >>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever >>>> >>>> and then: >>>> >>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ >>>> oop p = JNIHandles::resolve(obj); \ >>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) >>>> OrderAccess::fence(); \ >>>> volatile type_name v = OrderAccess::load_acquire((volatile >>>> type_name*)index_oop_from_field_offset_long(p, offset)); >>>> >>>> And: >>>> >>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu && >>>> field->is_volatile()) { >>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier >>>> + } >>>> >>>> And so on. The comments will be needed only in globalDefinitions.hpp >>>> >>>> The code in parse1.cpp could be put on one line: >>>> >>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && >>>> method()->is_initializer()) )) { >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 1/15/14 9:25 PM, David Holmes wrote: >>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >>>>>> Hi David, >>>>>> >>>>>> I updated the webrev: >>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>> >>>>>> - I removed the IRIW example in parse3.cpp >>>>>> - I adapted the comments not to point to that comment, and to >>>>>> reflect the new flagging. Also I mention that we support the >>>>>> volatile constructor issue, but that it's not standard. >>>>>> - I protected issuing the barrier for the constructor by PPC64. >>>>>> I also think it's better to separate these this way. >>>>> >>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field >>>>> declaration and all uses to be guarded by ifdef PPC64 too >>>>> please. >>>>> >>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this >>>>> comment doesn't make much sense to me and refers to >>>>> ppc specific stuff in a shared file: >>>>> >>>>> if (is_volatile) { >>>>> ! if (!is_store) { >>>>> insert_mem_bar(Op_MemBarAcquire); >>>>> ! } else { >>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. >>>>> insert_mem_bar(Op_MemBarVolatile); >>>>> + #endif >>>>> + } >>>>> >>>>> I don't think the comment is needed. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Thanks for your comments! >>>>>> >>>>>> Best regards, >>>>>> Goetz. >>>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>> Sent: Mittwoch, 15. Januar 2014 01:55 >>>>>> To: Lindenmaier, Goetz >>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; >>>>>> 'hotspot-dev at openjdk.java.net' >>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>> Independent Reads of Independent Writes >>>>>> >>>>>> Hi Goetz, >>>>>> >>>>>> Sorry for the delay in getting back to this. >>>>>> >>>>>> The general changes to the volatile barriers to support IRIW are okay. >>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >>>>>> specifically it is >>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >>>>>> the commentary excessive, particularly for shared code. In particular >>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the >>>>>> explanation and I don't think we need it to that level of detail. >>>>>> Seems >>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >>>>>> >>>>>> The changes related to volatile writes in the constructor, as >>>>>> discussed >>>>>> are not required by the Java Memory Model. If you want to keep these >>>>>> then I think they should all be guarded with PPC64 because it is not >>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >>>>>> PPC64 porters. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I updated this webrev. I detected a small flaw I made when editing >>>>>>> this version. >>>>>>> The #endif in line 322, parse3.cpp was in the wrong line. >>>>>>> I also based the webrev on the latest version of the stage repo. >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz. >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Lindenmaier, Goetz >>>>>>> Sent: Freitag, 20. Dezember 2013 13:47 >>>>>>> To: David Holmes >>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>> Independent Reads of Independent Writes >>>>>>> >>>>>>> Hi David, >>>>>>> >>>>>>>> So we can at least undo #4 now we have established those tests were >>>>>>>> not >>>>>>>> required to pass. >>>>>>> We would prefer if we could keep this in. We want to avoid that it's >>>>>>> blamed on the VM if java programs are failing on PPC after they >>>>>>> worked >>>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard >>>>>>> it by >>>>>>> a flag as proposed. But if you insist I will remove it. Also, this >>>>>>> part is >>>>>>> not that performance relevant. >>>>>>> >>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>> think >>>>>>> I added a compile-time guard in this new webrev: >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>>>>>> several double negations I don't like, (#ifNdef >>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC) >>>>>>> but this way I only have to change the ppc platform. >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz >>>>>>> >>>>>>> P.S.: I will also be available over the Christmas period. >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>> Sent: Freitag, 20. Dezember 2013 05:58 >>>>>>> To: Lindenmaier, Goetz >>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>> Independent Reads of Independent Writes >>>>>>> >>>>>>> Sorry for the delay, it takes a while to catch up after two weeks >>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check >>>>>>> emails. >>>>>>> >>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> ok, I understand the tests are wrong. It's good this issue is >>>>>>>> settled. >>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>>>>>> >>>>>>>> About our change: David, the causality is the other way round. >>>>>>>> The change is about IRIW. >>>>>>>> 1. To pass IRIW, we must use sync instructions before loads. >>>>>>> >>>>>>> This is the part I still have some question marks over as the >>>>>>> implications are not nice for performance on non-TSO platforms. >>>>>>> But I'm >>>>>>> no further along in processing that paper I'm afraid. >>>>>>> >>>>>>>> 2. If we do syncs before loads, we don't need to do them after >>>>>>>> stores. >>>>>>>> 3. If we don't do them after stores, we fail the volatile >>>>>>>> constructor tests. >>>>>>>> 4. So finally we added them again at the end of the constructor >>>>>>>> after stores >>>>>>>> to pass the volatile constructor tests. >>>>>>> >>>>>>> So we can at least undo #4 now we have established those tests >>>>>>> were not >>>>>>> required to pass. >>>>>>> >>>>>>>> We originally passed the constructor tests because the ppc memory >>>>>>>> order >>>>>>>> instructions are not as find-granular as the >>>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad. >>>>>>>> The only instruction >>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, >>>>>>>> therefore the >>>>>>>> MemBarVolatile after the store fixes the constructor tests. The >>>>>>>> proper representation >>>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now >>>>>>>> it's pointless >>>>>>>> anyways. >>>>>>>> >>>>>>>>> I'm not happy with the ifdef approach but I won't block it. >>>>>>>> I'd be happy to add a property >>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>> >>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>> think >>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>>>>>> based not architecture based) as that will allows for turning this >>>>>>> on/off for any architecture for testing purposes. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>>> or the like to guard the customization. I'd like that much better. >>>>>>>> Or also >>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>> >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Goetz. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>> Sent: Donnerstag, 28. November 2013 00:34 >>>>>>>> To: Lindenmaier, Goetz >>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> TL;DR version: >>>>>>>> >>>>>>>> Discussion on the c-i list has now confirmed that a >>>>>>>> constructor-barrier >>>>>>>> for volatiles is not required as part of the JMM specification. It >>>>>>>> *may* >>>>>>>> be required in an implementation that doesn't pre-zero memory to >>>>>>>> ensure >>>>>>>> you can't see uninitialized fields. So the tests for this are >>>>>>>> invalid >>>>>>>> and this part of the patch is not needed in general (ppc64 may >>>>>>>> need it >>>>>>>> due to other factors). >>>>>>>> >>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the >>>>>>>> term :) >>>>>>>> Second thanks for the reference to that paper! For reference: >>>>>>>> >>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>>>>>> complex interconnect) does not guarantee that a write becomes >>>>>>>> visible to >>>>>>>> all other hardware threads at the same time point; these >>>>>>>> architectures >>>>>>>> are not multiple-copy atomic." >>>>>>>> >>>>>>>> This is the visibility issue that I referred to and affects both >>>>>>>> ARM and >>>>>>>> PPC. But of course it is normally handled by using suitable barriers >>>>>>>> after the stores that need to be visible. I think the crux of the >>>>>>>> current issue is what you wrote below: >>>>>>>> >>>>>>>> > The fixes for the constructor issue are only needed because we >>>>>>>> > remove the sync instruction from behind stores >>>>>>>> (parse3.cpp:320) >>>>>>>> > and place it before loads. >>>>>>>> >>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync >>>>>>>> after >>>>>>>> the store then you have to do something around the loads to get the >>>>>>>> same >>>>>>>> results! I still don't know what lead you to the conclusion that the >>>>>>>> only way to fix the IRIW issue was to put the fence before the >>>>>>>> load - >>>>>>>> maybe when I get the chance to read that paper in full it will be >>>>>>>> clearer. >>>>>>>> >>>>>>>> So ... the basic problem is that the current structure in the VM has >>>>>>>> hard-wired one choice of how to get the right semantics for volatile >>>>>>>> variables. You now want to customize that but not all the requisite >>>>>>>> hooks are present. It would be better if volatile_load and >>>>>>>> volatile_store were factored out so that they could be >>>>>>>> implemented as >>>>>>>> desired per-platform. Alternatively there could be pre- and post- >>>>>>>> hooks >>>>>>>> that could then be customized per platform. Otherwise you need >>>>>>>> platform-specific ifdef's to handle it as per your patch. >>>>>>>> >>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think >>>>>>>> this >>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier >>>>>>>> abstractions are a confused mess in my opinion. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>> >>>>>>>>> I did not yet add the >>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>> or >>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>>> to reduce #defined, as I got no further comment on that. >>>>>>>>> >>>>>>>>> >>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>>>>>> I feel not in the position to contribute substantially. >>>>>>>>> >>>>>>>>> But we would like to pass the torture test suite as we consider >>>>>>>>> this a substantial task in implementing a PPC port. Also we think >>>>>>>>> both tests show behavior a programmer would expect. It's bad if >>>>>>>>> Java code runs fine on the more common x86 platform, and then >>>>>>>>> fails on ppc. This will always first be blamed on the VM. >>>>>>>>> >>>>>>>>> The fixes for the constructor issue are only needed because we >>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>>>>>> and place it before loads. Then there is no sync between volatile >>>>>>>>> store >>>>>>>>> and publishing the object. So we add it again in this one case >>>>>>>>> (volatile store in constructor). >>>>>>>>> >>>>>>>>> >>>>>>>>> @David >>>>>>>>>>> Sure. There also is no solution as you require for the >>>>>>>>>>> taskqueue problem yet, >>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>> continuous. >>>>>>>>> That's not true, we did a lot of investigation and testing on this >>>>>>>>> issue. >>>>>>>>> And we came up with a solution we consider the best possible. If >>>>>>>>> you >>>>>>>>> have objections, you should at least give the draft of a better >>>>>>>>> solution, >>>>>>>>> we would volunteer to implement and test it. >>>>>>>>> Similarly, we invested time in fixing the concurrency torture >>>>>>>>> issues. >>>>>>>>> >>>>>>>>> @David >>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term >>>>>>>>>> and >>>>>>>>>> can't find any reference to it. >>>>>>>>> We learned about this reading "A Tutorial Introduction to the >>>>>>>>> ARM and >>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient >>>>>>>>> Work-Stealing for >>>>>>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the >>>>>>>>> taskqueue problem. >>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>>>>> >>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I >>>>>>>>> used 'read' >>>>>>>>> instead. Sorry for that. (I also fixed that in the method name >>>>>>>>> above). >>>>>>>>> >>>>>>>>> Best regards and thanks for all your involvements, >>>>>>>>> Goetz. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> Hi Goetz, >>>>>>>>> >>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>>>>>> Hi David, >>>>>>>>>> >>>>>>>>>> -- Volatile in constuctor >>>>>>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>>>>>> missing constructor barrier. >>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32 >>>>>>>>>> processors >>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks >>>>>>>>>> Aleksey!) >>>>>>>>> >>>>>>>>> And see follow ups - the tests are invalid. >>>>>>>>> >>>>>>>>>> -- IRIW issue >>>>>>>>>>> I can not possibly answer to the necessary level of detail with >>>>>>>>>>> a few >>>>>>>>>>> moments thought. >>>>>>>>>> Sure. There also is no solution as you require for the taskqueue >>>>>>>>>> problem yet, >>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>> >>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>> continuous. >>>>>>>>> >>>>>>>>>>> You are implying there is a problem here that will >>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>> different?) >>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore >>>>>>>>>> I contributed a >>>>>>>>>> solution with the #defines, and that's correct for all, but not >>>>>>>>>> nice, I admit. >>>>>>>>>> (I don't really know about ARM, though). >>>>>>>>>> So if I can write down a nicer solution testing for methods that >>>>>>>>>> are evaluated >>>>>>>>>> by the C-compiler I'm happy. >>>>>>>>>> >>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the >>>>>>>>>> problem >>>>>>>>>> is that >>>>>>>>>> store >>>>>>>>>> sync >>>>>>>>>> does not assure multiple-read-atomicity, >>>>>>>>>> only >>>>>>>>>> sync >>>>>>>>>> load >>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>>>>>> pass that test. >>>>>>>>> >>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the >>>>>>>>> term and >>>>>>>>> can't find any reference to it. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> >>>>>>>>> The JMM is fine. And >>>>>>>>>> store >>>>>>>>>> MemBarVolatile >>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions >>>>>>>>>> that >>>>>>>>>> do what is required. >>>>>>>>>> >>>>>>>>>> So if you are off soon, please let's come to a solution that >>>>>>>>>> might be improvable in the way it's implemented, but that >>>>>>>>>> allows us to implement a correct PPC64 port. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Goetz. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; >>>>>>>>>> 'hotspot-dev at openjdk.java.net'; >>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>> >>>>>>>>>> Hi Goetz, >>>>>>>>>> >>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>> Hi everybody, >>>>>>>>>>> >>>>>>>>>>> thanks a lot for the detailed reviews! >>>>>>>>>>> I'll try to answer to all in one mail. >>>>>>>>>>> >>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM >>>>>>>>>>>> to occur before the reference is assigned; >>>>>>>>>>> We don't think it's correct if we omit the barrier after >>>>>>>>>>> initializing >>>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey >>>>>>>>>>> Shipilev >>>>>>>>>>> and Doug Lea, and they agreed. >>>>>>>>>>> Also, concurrency torture tests >>>>>>>>>>> LongVolatileTest >>>>>>>>>>> AtomicIntegerInitialValueTest >>>>>>>>>>> will fail. >>>>>>>>>>> (In addition, observing 0 instead of the inital value of a >>>>>>>>>>> volatile field would be >>>>>>>>>>> very counter-intuitive for Java programmers, especially in >>>>>>>>>>> AtomicInteger.) >>>>>>>>>> >>>>>>>>>> The affects of unsafe publication are always surprising - >>>>>>>>>> volatiles do >>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM >>>>>>>>>> that >>>>>>>>>> requires the constructor barrier - discussions with Doug and >>>>>>>>>> Aleksey >>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>>>>>> missing constructor barrier. >>>>>>>>>> >>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely >>>>>>>>>>>> heavyweight >>>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We >>>>>>>>>>> don't >>>>>>>>>>> see a way to implement this cheaper. >>>>>>>>>>> >>>>>>>>>>>> - these algorithms should be expressed using the correct >>>>>>>>>>>> OrderAccess operations >>>>>>>>>>> Basically, I agree on this. But you also have to take into >>>>>>>>>>> account >>>>>>>>>>> that due to the different memory ordering instructions on >>>>>>>>>>> different platforms >>>>>>>>>>> just implementing something empty is not sufficient. >>>>>>>>>>> An example: >>>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>>>>>> If these are consecutively in the code, sparc code looks like >>>>>>>>>>> this: >>>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore | >>>>>>>>>>> Assembler::StoreStore) >>>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>>>>>> Just doing what is required. >>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>>>>>> fine grained operations: >>>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore, >>>>>>>>>>> StoreStore, LoadLoad >>>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore, >>>>>>>>>>> StoreStore, LoadLoad, StoreLoad >>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations >>>>>>>>>>> are more (too) powerful, >>>>>>>>>>> I need an additional optimization that removes the lwsync. I >>>>>>>>>>> can not implement >>>>>>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>>>>>> >>>>>>>>>>> Back to the IRIW problem. I think here we have a comparable >>>>>>>>>>> issue. >>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the >>>>>>>>>>> read >>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>>>>>> >>>>>>>>>>> I would propose to guard the code by >>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>>>>>> Else, David, how would you propose to implement this platform >>>>>>>>>>> independent? >>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>>>>>> >>>>>>>>>> I can not possibly answer to the necessary level of detail with a >>>>>>>>>> few >>>>>>>>>> moments thought. You are implying there is a problem here that >>>>>>>>>> will >>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>> different?) and I can not take that on face value at the >>>>>>>>>> moment. The >>>>>>>>>> only reason I can see IRIW not being handled by the JMM >>>>>>>>>> requirements for >>>>>>>>>> volatile accesses is if there are global visibility issues that >>>>>>>>>> are not >>>>>>>>>> addressed - but even then I would expect heavy barriers at the >>>>>>>>>> store >>>>>>>>>> would deal with that, not at the load. (This situation reminds me >>>>>>>>>> of the >>>>>>>>>> need for read-barriers on Alpha architecture due to the use of >>>>>>>>>> software >>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we >>>>>>>>>> don't have >>>>>>>>>> that on ppc!) >>>>>>>>>> >>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days >>>>>>>>>> I will >>>>>>>>>> be heading out on vacation for two weeks. >>>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Goetz. >>>>>>>>>>> >>>>>>>>>>> -- Other ports: >>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so >>>>>>>>>>> it might >>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of >>>>>>>>>>> them). >>>>>>>>>>> >>>>>>>>>>> -- MemBarStoreStore after initialization >>>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I >>>>>>>>>>> can >>>>>>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>>>>>> To: Vladimir Kozlov >>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>> >>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable >>>>>>>>>>> way :) >>>>>>>>>>> >>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>>>>>> I have to ask David to do correctness evaluation. >>>>>>>>>>> >>>>>>>>>>> From what I understand what we see here is an attempt to >>>>>>>>>>> fix an >>>>>>>>>>> existing issue with the implementation of volatiles so that the >>>>>>>>>>> IRIW >>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when >>>>>>>>>>> doing the >>>>>>>>>>> load. >>>>>>>>>>> >>>>>>>>>>> Now if this was purely handled in ppc64 source code then I >>>>>>>>>>> would be >>>>>>>>>>> happy to let them do whatever they like (surely this kills >>>>>>>>>>> performance >>>>>>>>>>> though!). But I do not agree with the changes to the shared code >>>>>>>>>>> that >>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY >>>>>>>>>>> this is >>>>>>>>>>> polluting the shared code. My concern is similar to what I said >>>>>>>>>>> with the >>>>>>>>>>> taskQueue changes - these algorithms should be expressed using >>>>>>>>>>> the >>>>>>>>>>> correct OrderAccess operations to guarantee the desired >>>>>>>>>>> properties >>>>>>>>>>> independent of architecture. If such a "barrier" is not needed >>>>>>>>>>> on a >>>>>>>>>>> given architecture then the implementation in OrderAccess should >>>>>>>>>>> reduce >>>>>>>>>>> to a no-op. >>>>>>>>>>> >>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed >>>>>>>>>>> under >>>>>>>>>>> the JMM. >>>>>>>>>>> >>>>>>>>>>>> I am fine with suggested changes because you did not change our >>>>>>>>>>>> current >>>>>>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>>>>>> But may be it should be done using more general query which >>>>>>>>>>>> is set >>>>>>>>>>>> depending on platform: >>>>>>>>>>>> >>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>>> >>>>>>>>>>>> or similar to what we use now: >>>>>>>>>>>> >>>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>> >>>>>>>>>>> Every platform has to support IRIW this is simply part of the >>>>>>>>>>> Java >>>>>>>>>>> Memory Model, there should not be any need to call this out >>>>>>>>>>> explicitly >>>>>>>>>>> like this. >>>>>>>>>>> >>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are >>>>>>>>>>> there >>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM >>>>>>>>>>> defines? >>>>>>>>>>>> From what I understand our ppc port is also affected. >>>>>>>>>>>> David? >>>>>>>>>>> >>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be >>>>>>>>>>>> inside else {}. >>>>>>>>>>>> >>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64 >>>>>>>>>>>> specific which >>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need >>>>>>>>>>>> PPC64_ONLY() >>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many >>>>>>>>>>>> #ifdefs. >>>>>>>>>>>> >>>>>>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>>>>>> >>>>>>>>>>>> if (is_vol) { >>>>>>>>>>>> // See comment in do_get_xxx(). >>>>>>>>>>>> #ifndef PPC64 >>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>>>>>> #else >>>>>>>>>>>> if (is_field) { >>>>>>>>>>>> // Add MemBarRelease for constructors which write >>>>>>>>>>>> volatile field >>>>>>>>>>>> (PPC64). >>>>>>>>>>>> set_wrote_volatile(true); >>>>>>>>>>>> } >>>>>>>>>>>> #endif >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Vladimir >>>>>>>>>>>> >>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the >>>>>>>>>>>>> VolatileIRIWTest of >>>>>>>>>>>>> the torture test suite: >>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>>>> >>>>>>>>>>>>> Example: >>>>>>>>>>>>> volatile x=0, y=0 >>>>>>>>>>>>> __________ __________ __________ __________ >>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>>>>>> >>>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>>>>>> read(y) read(x) >>>>>>>>>>>>> >>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This >>>>>>>>>>>>> is only >>>>>>>>>>>>> assured by the sync instruction and if it is executed in the >>>>>>>>>>>>> threads >>>>>>>>>>>>> doing the loads. Thus we implement volatile read as >>>>>>>>>>>>> sync-load-acquire >>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>>>>>>> >>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple >>>>>>>>>>>>> SIGSEGVs >>>>>>>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>>>>>>> >>>>>>>>>>>>> Further this change contains a fix that assures that volatile >>>>>>>>>>>>> fields >>>>>>>>>>>>> written in constructors are visible before the reference gets >>>>>>>>>>>>> published. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us, >>>>>>>>>>>>> seems too >>>>>>>>>>>>> strong. >>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should >>>>>>>>>>>>> suffice. >>>>>>>>>>>>> What do you think? >>>>>>>>>>>>> >>>>>>>>>>>>> Please review and test this change. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards, >>>>>>>>>>>>> Goetz. >>>>>>>>>>>>> From goetz.lindenmaier at sap.com Sat Jan 18 02:36:02 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Sat, 18 Jan 2014 10:36:02 +0000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <52D9D35F.2050501@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> <52D78697.2090408@oracle.com> <52D79982.4060100@oracle.com> <52D79E61.1060801@oracle.com> <52D7A0A9.6070208@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8CF70@DEWDFEMB12A.global.corp.sap> <52D9D35F.2050501@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8D566@DEWDFEMB12A.global.corp.sap> Hi, I updated the comments in the webrev accordingly. http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ Best regards, Goetz. -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Saturday, January 18, 2014 2:06 AM To: Lindenmaier, Goetz; David Holmes Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes Goetz, I asked to remove both comments in parse.hpp, comments in parse1.cpp and parse3.cpp is enough. It should be similar to _wrote_final: bool _wrote_volatile; // Did we write a final field? I think the next comment in parse3.cpp should be modified: + // But remember we wrote a volatile field so that a barrier can be issued + // in constructors. See do_exits() in parse1.cpp. // Remember we wrote a volatile field. // For not multiple copy atomic cpu (ppc64) a barrier should be issued // in constructors which have such stores. See do_exits() in parse1.cpp. Thanks, Vladimir On 1/17/14 12:39 AM, Lindenmaier, Goetz wrote: > Hi, > > I tried to come up with a webrev that implements the change as proposed in > your mails: > http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ > > Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use > support_IRIW_for_not_multiple_copy_atomic_cpu. > > I left the definition and handling of _wrote_volatile in the code, without > any protection. > I protected issuing the barrier for volatile in constructors with PPC64_ONLY() , > and put it on one line. > > I removed the comment in library_call.cpp. > I also removed the sentence " Solution: implement volatile read as sync-load-acquire." > from the comments as it's PPC specific. > > Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these > issues in C1 if nobody did it by then. > > Wrt. to performance: Oracle will soon do heavy testing of the port. If any > performance problems arise, we still can add #ifdef PPC64 to circumvent this. > > Best regards, > Goetz. > > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Donnerstag, 16. Januar 2014 10:05 > To: Vladimir Kozlov > Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > On 16/01/2014 6:54 PM, Vladimir Kozlov wrote: >> On 1/16/14 12:34 AM, David Holmes wrote: >>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote: >>>> This is becoming ugly #ifdef mess. In compiler code we are trying to >>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I >>>> want to keep it this way, it could be useful to have such info on other >>>> platforms too. But I would suggest to remove PPC64 comments in >>>> parse.hpp. >>>> >>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value >>>> which could be checked in all places instead of #ifdef: >>> >>> I asked for the ifdef some time back as I find it much preferable to >>> have this as a build-time construct rather than a >>> runtime one. I don't want to have to pay anything for this if we don't >>> use it. >> >> Any decent C++ compiler will optimize expressions with such constants >> defined in header files. I insist to avoid #ifdefs in C2 code. I really >> don't like the code with #ifdef in unsafe.cpp but I can live with it. > > If you insist then we may as well do it all the same way. Better to be > consistent. > > My apologies Goetz for wasting your time going back and forth on this. > > That aside I have a further concern with this IRIW support - it is > incomplete as there is no C1 support, as PPC64 isn't using client. If > this is going on then we (which probably means the Oracle 'we') need to > add the missing C1 code. > > David > ----- > >> Vladimir >> >>> >>> David >>> >>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; >>>> #else >>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; >>>> #endif >>>> >>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever >>>> >>>> and then: >>>> >>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ >>>> oop p = JNIHandles::resolve(obj); \ >>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) >>>> OrderAccess::fence(); \ >>>> volatile type_name v = OrderAccess::load_acquire((volatile >>>> type_name*)index_oop_from_field_offset_long(p, offset)); >>>> >>>> And: >>>> >>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu && >>>> field->is_volatile()) { >>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier >>>> + } >>>> >>>> And so on. The comments will be needed only in globalDefinitions.hpp >>>> >>>> The code in parse1.cpp could be put on one line: >>>> >>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && >>>> method()->is_initializer()) )) { >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 1/15/14 9:25 PM, David Holmes wrote: >>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >>>>>> Hi David, >>>>>> >>>>>> I updated the webrev: >>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>> >>>>>> - I removed the IRIW example in parse3.cpp >>>>>> - I adapted the comments not to point to that comment, and to >>>>>> reflect the new flagging. Also I mention that we support the >>>>>> volatile constructor issue, but that it's not standard. >>>>>> - I protected issuing the barrier for the constructor by PPC64. >>>>>> I also think it's better to separate these this way. >>>>> >>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field >>>>> declaration and all uses to be guarded by ifdef PPC64 too >>>>> please. >>>>> >>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this >>>>> comment doesn't make much sense to me and refers to >>>>> ppc specific stuff in a shared file: >>>>> >>>>> if (is_volatile) { >>>>> ! if (!is_store) { >>>>> insert_mem_bar(Op_MemBarAcquire); >>>>> ! } else { >>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. >>>>> insert_mem_bar(Op_MemBarVolatile); >>>>> + #endif >>>>> + } >>>>> >>>>> I don't think the comment is needed. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Thanks for your comments! >>>>>> >>>>>> Best regards, >>>>>> Goetz. >>>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>> Sent: Mittwoch, 15. Januar 2014 01:55 >>>>>> To: Lindenmaier, Goetz >>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; >>>>>> 'hotspot-dev at openjdk.java.net' >>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>> Independent Reads of Independent Writes >>>>>> >>>>>> Hi Goetz, >>>>>> >>>>>> Sorry for the delay in getting back to this. >>>>>> >>>>>> The general changes to the volatile barriers to support IRIW are okay. >>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >>>>>> specifically it is >>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >>>>>> the commentary excessive, particularly for shared code. In particular >>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the >>>>>> explanation and I don't think we need it to that level of detail. >>>>>> Seems >>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >>>>>> >>>>>> The changes related to volatile writes in the constructor, as >>>>>> discussed >>>>>> are not required by the Java Memory Model. If you want to keep these >>>>>> then I think they should all be guarded with PPC64 because it is not >>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >>>>>> PPC64 porters. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I updated this webrev. I detected a small flaw I made when editing >>>>>>> this version. >>>>>>> The #endif in line 322, parse3.cpp was in the wrong line. >>>>>>> I also based the webrev on the latest version of the stage repo. >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz. >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Lindenmaier, Goetz >>>>>>> Sent: Freitag, 20. Dezember 2013 13:47 >>>>>>> To: David Holmes >>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>> Independent Reads of Independent Writes >>>>>>> >>>>>>> Hi David, >>>>>>> >>>>>>>> So we can at least undo #4 now we have established those tests were >>>>>>>> not >>>>>>>> required to pass. >>>>>>> We would prefer if we could keep this in. We want to avoid that it's >>>>>>> blamed on the VM if java programs are failing on PPC after they >>>>>>> worked >>>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard >>>>>>> it by >>>>>>> a flag as proposed. But if you insist I will remove it. Also, this >>>>>>> part is >>>>>>> not that performance relevant. >>>>>>> >>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>> think >>>>>>> I added a compile-time guard in this new webrev: >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>>>>>> several double negations I don't like, (#ifNdef >>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC) >>>>>>> but this way I only have to change the ppc platform. >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz >>>>>>> >>>>>>> P.S.: I will also be available over the Christmas period. >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>> Sent: Freitag, 20. Dezember 2013 05:58 >>>>>>> To: Lindenmaier, Goetz >>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>> Independent Reads of Independent Writes >>>>>>> >>>>>>> Sorry for the delay, it takes a while to catch up after two weeks >>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check >>>>>>> emails. >>>>>>> >>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> ok, I understand the tests are wrong. It's good this issue is >>>>>>>> settled. >>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>>>>>> >>>>>>>> About our change: David, the causality is the other way round. >>>>>>>> The change is about IRIW. >>>>>>>> 1. To pass IRIW, we must use sync instructions before loads. >>>>>>> >>>>>>> This is the part I still have some question marks over as the >>>>>>> implications are not nice for performance on non-TSO platforms. >>>>>>> But I'm >>>>>>> no further along in processing that paper I'm afraid. >>>>>>> >>>>>>>> 2. If we do syncs before loads, we don't need to do them after >>>>>>>> stores. >>>>>>>> 3. If we don't do them after stores, we fail the volatile >>>>>>>> constructor tests. >>>>>>>> 4. So finally we added them again at the end of the constructor >>>>>>>> after stores >>>>>>>> to pass the volatile constructor tests. >>>>>>> >>>>>>> So we can at least undo #4 now we have established those tests >>>>>>> were not >>>>>>> required to pass. >>>>>>> >>>>>>>> We originally passed the constructor tests because the ppc memory >>>>>>>> order >>>>>>>> instructions are not as find-granular as the >>>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad. >>>>>>>> The only instruction >>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, >>>>>>>> therefore the >>>>>>>> MemBarVolatile after the store fixes the constructor tests. The >>>>>>>> proper representation >>>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now >>>>>>>> it's pointless >>>>>>>> anyways. >>>>>>>> >>>>>>>>> I'm not happy with the ifdef approach but I won't block it. >>>>>>>> I'd be happy to add a property >>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>> >>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>> think >>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>>>>>> based not architecture based) as that will allows for turning this >>>>>>> on/off for any architecture for testing purposes. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>>> or the like to guard the customization. I'd like that much better. >>>>>>>> Or also >>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>> >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Goetz. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>> Sent: Donnerstag, 28. November 2013 00:34 >>>>>>>> To: Lindenmaier, Goetz >>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> TL;DR version: >>>>>>>> >>>>>>>> Discussion on the c-i list has now confirmed that a >>>>>>>> constructor-barrier >>>>>>>> for volatiles is not required as part of the JMM specification. It >>>>>>>> *may* >>>>>>>> be required in an implementation that doesn't pre-zero memory to >>>>>>>> ensure >>>>>>>> you can't see uninitialized fields. So the tests for this are >>>>>>>> invalid >>>>>>>> and this part of the patch is not needed in general (ppc64 may >>>>>>>> need it >>>>>>>> due to other factors). >>>>>>>> >>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the >>>>>>>> term :) >>>>>>>> Second thanks for the reference to that paper! For reference: >>>>>>>> >>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>>>>>> complex interconnect) does not guarantee that a write becomes >>>>>>>> visible to >>>>>>>> all other hardware threads at the same time point; these >>>>>>>> architectures >>>>>>>> are not multiple-copy atomic." >>>>>>>> >>>>>>>> This is the visibility issue that I referred to and affects both >>>>>>>> ARM and >>>>>>>> PPC. But of course it is normally handled by using suitable barriers >>>>>>>> after the stores that need to be visible. I think the crux of the >>>>>>>> current issue is what you wrote below: >>>>>>>> >>>>>>>> > The fixes for the constructor issue are only needed because we >>>>>>>> > remove the sync instruction from behind stores >>>>>>>> (parse3.cpp:320) >>>>>>>> > and place it before loads. >>>>>>>> >>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync >>>>>>>> after >>>>>>>> the store then you have to do something around the loads to get the >>>>>>>> same >>>>>>>> results! I still don't know what lead you to the conclusion that the >>>>>>>> only way to fix the IRIW issue was to put the fence before the >>>>>>>> load - >>>>>>>> maybe when I get the chance to read that paper in full it will be >>>>>>>> clearer. >>>>>>>> >>>>>>>> So ... the basic problem is that the current structure in the VM has >>>>>>>> hard-wired one choice of how to get the right semantics for volatile >>>>>>>> variables. You now want to customize that but not all the requisite >>>>>>>> hooks are present. It would be better if volatile_load and >>>>>>>> volatile_store were factored out so that they could be >>>>>>>> implemented as >>>>>>>> desired per-platform. Alternatively there could be pre- and post- >>>>>>>> hooks >>>>>>>> that could then be customized per platform. Otherwise you need >>>>>>>> platform-specific ifdef's to handle it as per your patch. >>>>>>>> >>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think >>>>>>>> this >>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier >>>>>>>> abstractions are a confused mess in my opinion. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>> >>>>>>>>> I did not yet add the >>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>> or >>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>>> to reduce #defined, as I got no further comment on that. >>>>>>>>> >>>>>>>>> >>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>>>>>> I feel not in the position to contribute substantially. >>>>>>>>> >>>>>>>>> But we would like to pass the torture test suite as we consider >>>>>>>>> this a substantial task in implementing a PPC port. Also we think >>>>>>>>> both tests show behavior a programmer would expect. It's bad if >>>>>>>>> Java code runs fine on the more common x86 platform, and then >>>>>>>>> fails on ppc. This will always first be blamed on the VM. >>>>>>>>> >>>>>>>>> The fixes for the constructor issue are only needed because we >>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>>>>>> and place it before loads. Then there is no sync between volatile >>>>>>>>> store >>>>>>>>> and publishing the object. So we add it again in this one case >>>>>>>>> (volatile store in constructor). >>>>>>>>> >>>>>>>>> >>>>>>>>> @David >>>>>>>>>>> Sure. There also is no solution as you require for the >>>>>>>>>>> taskqueue problem yet, >>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>> continuous. >>>>>>>>> That's not true, we did a lot of investigation and testing on this >>>>>>>>> issue. >>>>>>>>> And we came up with a solution we consider the best possible. If >>>>>>>>> you >>>>>>>>> have objections, you should at least give the draft of a better >>>>>>>>> solution, >>>>>>>>> we would volunteer to implement and test it. >>>>>>>>> Similarly, we invested time in fixing the concurrency torture >>>>>>>>> issues. >>>>>>>>> >>>>>>>>> @David >>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term >>>>>>>>>> and >>>>>>>>>> can't find any reference to it. >>>>>>>>> We learned about this reading "A Tutorial Introduction to the >>>>>>>>> ARM and >>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient >>>>>>>>> Work-Stealing for >>>>>>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the >>>>>>>>> taskqueue problem. >>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>>>>> >>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I >>>>>>>>> used 'read' >>>>>>>>> instead. Sorry for that. (I also fixed that in the method name >>>>>>>>> above). >>>>>>>>> >>>>>>>>> Best regards and thanks for all your involvements, >>>>>>>>> Goetz. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> Hi Goetz, >>>>>>>>> >>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>>>>>> Hi David, >>>>>>>>>> >>>>>>>>>> -- Volatile in constuctor >>>>>>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>>>>>> missing constructor barrier. >>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32 >>>>>>>>>> processors >>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks >>>>>>>>>> Aleksey!) >>>>>>>>> >>>>>>>>> And see follow ups - the tests are invalid. >>>>>>>>> >>>>>>>>>> -- IRIW issue >>>>>>>>>>> I can not possibly answer to the necessary level of detail with >>>>>>>>>>> a few >>>>>>>>>>> moments thought. >>>>>>>>>> Sure. There also is no solution as you require for the taskqueue >>>>>>>>>> problem yet, >>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>> >>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>> continuous. >>>>>>>>> >>>>>>>>>>> You are implying there is a problem here that will >>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>> different?) >>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore >>>>>>>>>> I contributed a >>>>>>>>>> solution with the #defines, and that's correct for all, but not >>>>>>>>>> nice, I admit. >>>>>>>>>> (I don't really know about ARM, though). >>>>>>>>>> So if I can write down a nicer solution testing for methods that >>>>>>>>>> are evaluated >>>>>>>>>> by the C-compiler I'm happy. >>>>>>>>>> >>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the >>>>>>>>>> problem >>>>>>>>>> is that >>>>>>>>>> store >>>>>>>>>> sync >>>>>>>>>> does not assure multiple-read-atomicity, >>>>>>>>>> only >>>>>>>>>> sync >>>>>>>>>> load >>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>>>>>> pass that test. >>>>>>>>> >>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the >>>>>>>>> term and >>>>>>>>> can't find any reference to it. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> >>>>>>>>> The JMM is fine. And >>>>>>>>>> store >>>>>>>>>> MemBarVolatile >>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions >>>>>>>>>> that >>>>>>>>>> do what is required. >>>>>>>>>> >>>>>>>>>> So if you are off soon, please let's come to a solution that >>>>>>>>>> might be improvable in the way it's implemented, but that >>>>>>>>>> allows us to implement a correct PPC64 port. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Goetz. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; >>>>>>>>>> 'hotspot-dev at openjdk.java.net'; >>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>> >>>>>>>>>> Hi Goetz, >>>>>>>>>> >>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>> Hi everybody, >>>>>>>>>>> >>>>>>>>>>> thanks a lot for the detailed reviews! >>>>>>>>>>> I'll try to answer to all in one mail. >>>>>>>>>>> >>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM >>>>>>>>>>>> to occur before the reference is assigned; >>>>>>>>>>> We don't think it's correct if we omit the barrier after >>>>>>>>>>> initializing >>>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey >>>>>>>>>>> Shipilev >>>>>>>>>>> and Doug Lea, and they agreed. >>>>>>>>>>> Also, concurrency torture tests >>>>>>>>>>> LongVolatileTest >>>>>>>>>>> AtomicIntegerInitialValueTest >>>>>>>>>>> will fail. >>>>>>>>>>> (In addition, observing 0 instead of the inital value of a >>>>>>>>>>> volatile field would be >>>>>>>>>>> very counter-intuitive for Java programmers, especially in >>>>>>>>>>> AtomicInteger.) >>>>>>>>>> >>>>>>>>>> The affects of unsafe publication are always surprising - >>>>>>>>>> volatiles do >>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM >>>>>>>>>> that >>>>>>>>>> requires the constructor barrier - discussions with Doug and >>>>>>>>>> Aleksey >>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>>>>>> missing constructor barrier. >>>>>>>>>> >>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely >>>>>>>>>>>> heavyweight >>>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We >>>>>>>>>>> don't >>>>>>>>>>> see a way to implement this cheaper. >>>>>>>>>>> >>>>>>>>>>>> - these algorithms should be expressed using the correct >>>>>>>>>>>> OrderAccess operations >>>>>>>>>>> Basically, I agree on this. But you also have to take into >>>>>>>>>>> account >>>>>>>>>>> that due to the different memory ordering instructions on >>>>>>>>>>> different platforms >>>>>>>>>>> just implementing something empty is not sufficient. >>>>>>>>>>> An example: >>>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>>>>>> If these are consecutively in the code, sparc code looks like >>>>>>>>>>> this: >>>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore | >>>>>>>>>>> Assembler::StoreStore) >>>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>>>>>> Just doing what is required. >>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>>>>>> fine grained operations: >>>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore, >>>>>>>>>>> StoreStore, LoadLoad >>>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore, >>>>>>>>>>> StoreStore, LoadLoad, StoreLoad >>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations >>>>>>>>>>> are more (too) powerful, >>>>>>>>>>> I need an additional optimization that removes the lwsync. I >>>>>>>>>>> can not implement >>>>>>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>>>>>> >>>>>>>>>>> Back to the IRIW problem. I think here we have a comparable >>>>>>>>>>> issue. >>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the >>>>>>>>>>> read >>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>>>>>> >>>>>>>>>>> I would propose to guard the code by >>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>>>>>> Else, David, how would you propose to implement this platform >>>>>>>>>>> independent? >>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>>>>>> >>>>>>>>>> I can not possibly answer to the necessary level of detail with a >>>>>>>>>> few >>>>>>>>>> moments thought. You are implying there is a problem here that >>>>>>>>>> will >>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>> different?) and I can not take that on face value at the >>>>>>>>>> moment. The >>>>>>>>>> only reason I can see IRIW not being handled by the JMM >>>>>>>>>> requirements for >>>>>>>>>> volatile accesses is if there are global visibility issues that >>>>>>>>>> are not >>>>>>>>>> addressed - but even then I would expect heavy barriers at the >>>>>>>>>> store >>>>>>>>>> would deal with that, not at the load. (This situation reminds me >>>>>>>>>> of the >>>>>>>>>> need for read-barriers on Alpha architecture due to the use of >>>>>>>>>> software >>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we >>>>>>>>>> don't have >>>>>>>>>> that on ppc!) >>>>>>>>>> >>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days >>>>>>>>>> I will >>>>>>>>>> be heading out on vacation for two weeks. >>>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Goetz. >>>>>>>>>>> >>>>>>>>>>> -- Other ports: >>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so >>>>>>>>>>> it might >>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of >>>>>>>>>>> them). >>>>>>>>>>> >>>>>>>>>>> -- MemBarStoreStore after initialization >>>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I >>>>>>>>>>> can >>>>>>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>>>>>> To: Vladimir Kozlov >>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>> >>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable >>>>>>>>>>> way :) >>>>>>>>>>> >>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>>>>>> I have to ask David to do correctness evaluation. >>>>>>>>>>> >>>>>>>>>>> From what I understand what we see here is an attempt to >>>>>>>>>>> fix an >>>>>>>>>>> existing issue with the implementation of volatiles so that the >>>>>>>>>>> IRIW >>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when >>>>>>>>>>> doing the >>>>>>>>>>> load. >>>>>>>>>>> >>>>>>>>>>> Now if this was purely handled in ppc64 source code then I >>>>>>>>>>> would be >>>>>>>>>>> happy to let them do whatever they like (surely this kills >>>>>>>>>>> performance >>>>>>>>>>> though!). But I do not agree with the changes to the shared code >>>>>>>>>>> that >>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY >>>>>>>>>>> this is >>>>>>>>>>> polluting the shared code. My concern is similar to what I said >>>>>>>>>>> with the >>>>>>>>>>> taskQueue changes - these algorithms should be expressed using >>>>>>>>>>> the >>>>>>>>>>> correct OrderAccess operations to guarantee the desired >>>>>>>>>>> properties >>>>>>>>>>> independent of architecture. If such a "barrier" is not needed >>>>>>>>>>> on a >>>>>>>>>>> given architecture then the implementation in OrderAccess should >>>>>>>>>>> reduce >>>>>>>>>>> to a no-op. >>>>>>>>>>> >>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed >>>>>>>>>>> under >>>>>>>>>>> the JMM. >>>>>>>>>>> >>>>>>>>>>>> I am fine with suggested changes because you did not change our >>>>>>>>>>>> current >>>>>>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>>>>>> But may be it should be done using more general query which >>>>>>>>>>>> is set >>>>>>>>>>>> depending on platform: >>>>>>>>>>>> >>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>>> >>>>>>>>>>>> or similar to what we use now: >>>>>>>>>>>> >>>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>> >>>>>>>>>>> Every platform has to support IRIW this is simply part of the >>>>>>>>>>> Java >>>>>>>>>>> Memory Model, there should not be any need to call this out >>>>>>>>>>> explicitly >>>>>>>>>>> like this. >>>>>>>>>>> >>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are >>>>>>>>>>> there >>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM >>>>>>>>>>> defines? >>>>>>>>>>>> From what I understand our ppc port is also affected. >>>>>>>>>>>> David? >>>>>>>>>>> >>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be >>>>>>>>>>>> inside else {}. >>>>>>>>>>>> >>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64 >>>>>>>>>>>> specific which >>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need >>>>>>>>>>>> PPC64_ONLY() >>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many >>>>>>>>>>>> #ifdefs. >>>>>>>>>>>> >>>>>>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>>>>>> >>>>>>>>>>>> if (is_vol) { >>>>>>>>>>>> // See comment in do_get_xxx(). >>>>>>>>>>>> #ifndef PPC64 >>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>>>>>> #else >>>>>>>>>>>> if (is_field) { >>>>>>>>>>>> // Add MemBarRelease for constructors which write >>>>>>>>>>>> volatile field >>>>>>>>>>>> (PPC64). >>>>>>>>>>>> set_wrote_volatile(true); >>>>>>>>>>>> } >>>>>>>>>>>> #endif >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Vladimir >>>>>>>>>>>> >>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the >>>>>>>>>>>>> VolatileIRIWTest of >>>>>>>>>>>>> the torture test suite: >>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>>>> >>>>>>>>>>>>> Example: >>>>>>>>>>>>> volatile x=0, y=0 >>>>>>>>>>>>> __________ __________ __________ __________ >>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>>>>>> >>>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>>>>>> read(y) read(x) >>>>>>>>>>>>> >>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This >>>>>>>>>>>>> is only >>>>>>>>>>>>> assured by the sync instruction and if it is executed in the >>>>>>>>>>>>> threads >>>>>>>>>>>>> doing the loads. Thus we implement volatile read as >>>>>>>>>>>>> sync-load-acquire >>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>>>>>>> >>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple >>>>>>>>>>>>> SIGSEGVs >>>>>>>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>>>>>>> >>>>>>>>>>>>> Further this change contains a fix that assures that volatile >>>>>>>>>>>>> fields >>>>>>>>>>>>> written in constructors are visible before the reference gets >>>>>>>>>>>>> published. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us, >>>>>>>>>>>>> seems too >>>>>>>>>>>>> strong. >>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should >>>>>>>>>>>>> suffice. >>>>>>>>>>>>> What do you think? >>>>>>>>>>>>> >>>>>>>>>>>>> Please review and test this change. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards, >>>>>>>>>>>>> Goetz. >>>>>>>>>>>>> From vladimir.kozlov at oracle.com Sat Jan 18 09:36:04 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sat, 18 Jan 2014 09:36:04 -0800 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE8D566@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> <52D78697.2090408@oracle.com> <52D79982.4060100@oracle.com> <52D79E61.1060801@oracle.com> <52D7A0A9.6070208@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8CF70@DEWDFEMB12A.global.corp.sap> <52D9D35F.2050501@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8D566@DEWDFEMB12A.global.corp.sap> Message-ID: <52DABB84.1090400@oracle.com> Good. Thanks, Vladimir On 1/18/14 2:36 AM, Lindenmaier, Goetz wrote: > Hi, > > I updated the comments in the webrev accordingly. > http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ > > Best regards, > Goetz. > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Saturday, January 18, 2014 2:06 AM > To: Lindenmaier, Goetz; David Holmes > Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > Goetz, > > I asked to remove both comments in parse.hpp, comments in parse1.cpp and > parse3.cpp is enough. It should be similar to _wrote_final: > > bool _wrote_volatile; // Did we write a final field? > > > I think the next comment in parse3.cpp should be modified: > > + // But remember we wrote a volatile field so that a barrier can be > issued > + // in constructors. See do_exits() in parse1.cpp. > > // Remember we wrote a volatile field. > // For not multiple copy atomic cpu (ppc64) a barrier should be issued > // in constructors which have such stores. See do_exits() in parse1.cpp. > > Thanks, > Vladimir > > On 1/17/14 12:39 AM, Lindenmaier, Goetz wrote: >> Hi, >> >> I tried to come up with a webrev that implements the change as proposed in >> your mails: >> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ >> >> Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use >> support_IRIW_for_not_multiple_copy_atomic_cpu. >> >> I left the definition and handling of _wrote_volatile in the code, without >> any protection. >> I protected issuing the barrier for volatile in constructors with PPC64_ONLY() , >> and put it on one line. >> >> I removed the comment in library_call.cpp. >> I also removed the sentence " Solution: implement volatile read as sync-load-acquire." >> from the comments as it's PPC specific. >> >> Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these >> issues in C1 if nobody did it by then. >> >> Wrt. to performance: Oracle will soon do heavy testing of the port. If any >> performance problems arise, we still can add #ifdef PPC64 to circumvent this. >> >> Best regards, >> Goetz. >> >> >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Donnerstag, 16. Januar 2014 10:05 >> To: Vladimir Kozlov >> Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> On 16/01/2014 6:54 PM, Vladimir Kozlov wrote: >>> On 1/16/14 12:34 AM, David Holmes wrote: >>>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote: >>>>> This is becoming ugly #ifdef mess. In compiler code we are trying to >>>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I >>>>> want to keep it this way, it could be useful to have such info on other >>>>> platforms too. But I would suggest to remove PPC64 comments in >>>>> parse.hpp. >>>>> >>>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value >>>>> which could be checked in all places instead of #ifdef: >>>> >>>> I asked for the ifdef some time back as I find it much preferable to >>>> have this as a build-time construct rather than a >>>> runtime one. I don't want to have to pay anything for this if we don't >>>> use it. >>> >>> Any decent C++ compiler will optimize expressions with such constants >>> defined in header files. I insist to avoid #ifdefs in C2 code. I really >>> don't like the code with #ifdef in unsafe.cpp but I can live with it. >> >> If you insist then we may as well do it all the same way. Better to be >> consistent. >> >> My apologies Goetz for wasting your time going back and forth on this. >> >> That aside I have a further concern with this IRIW support - it is >> incomplete as there is no C1 support, as PPC64 isn't using client. If >> this is going on then we (which probably means the Oracle 'we') need to >> add the missing C1 code. >> >> David >> ----- >> >>> Vladimir >>> >>>> >>>> David >>>> >>>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; >>>>> #else >>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; >>>>> #endif >>>>> >>>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever >>>>> >>>>> and then: >>>>> >>>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ >>>>> oop p = JNIHandles::resolve(obj); \ >>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) >>>>> OrderAccess::fence(); \ >>>>> volatile type_name v = OrderAccess::load_acquire((volatile >>>>> type_name*)index_oop_from_field_offset_long(p, offset)); >>>>> >>>>> And: >>>>> >>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu && >>>>> field->is_volatile()) { >>>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier >>>>> + } >>>>> >>>>> And so on. The comments will be needed only in globalDefinitions.hpp >>>>> >>>>> The code in parse1.cpp could be put on one line: >>>>> >>>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && >>>>> method()->is_initializer()) )) { >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 1/15/14 9:25 PM, David Holmes wrote: >>>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >>>>>>> Hi David, >>>>>>> >>>>>>> I updated the webrev: >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>> >>>>>>> - I removed the IRIW example in parse3.cpp >>>>>>> - I adapted the comments not to point to that comment, and to >>>>>>> reflect the new flagging. Also I mention that we support the >>>>>>> volatile constructor issue, but that it's not standard. >>>>>>> - I protected issuing the barrier for the constructor by PPC64. >>>>>>> I also think it's better to separate these this way. >>>>>> >>>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field >>>>>> declaration and all uses to be guarded by ifdef PPC64 too >>>>>> please. >>>>>> >>>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this >>>>>> comment doesn't make much sense to me and refers to >>>>>> ppc specific stuff in a shared file: >>>>>> >>>>>> if (is_volatile) { >>>>>> ! if (!is_store) { >>>>>> insert_mem_bar(Op_MemBarAcquire); >>>>>> ! } else { >>>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. >>>>>> insert_mem_bar(Op_MemBarVolatile); >>>>>> + #endif >>>>>> + } >>>>>> >>>>>> I don't think the comment is needed. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>>> Thanks for your comments! >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz. >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>> Sent: Mittwoch, 15. Januar 2014 01:55 >>>>>>> To: Lindenmaier, Goetz >>>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; >>>>>>> 'hotspot-dev at openjdk.java.net' >>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>> Independent Reads of Independent Writes >>>>>>> >>>>>>> Hi Goetz, >>>>>>> >>>>>>> Sorry for the delay in getting back to this. >>>>>>> >>>>>>> The general changes to the volatile barriers to support IRIW are okay. >>>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >>>>>>> specifically it is >>>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >>>>>>> the commentary excessive, particularly for shared code. In particular >>>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the >>>>>>> explanation and I don't think we need it to that level of detail. >>>>>>> Seems >>>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >>>>>>> >>>>>>> The changes related to volatile writes in the constructor, as >>>>>>> discussed >>>>>>> are not required by the Java Memory Model. If you want to keep these >>>>>>> then I think they should all be guarded with PPC64 because it is not >>>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >>>>>>> PPC64 porters. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I updated this webrev. I detected a small flaw I made when editing >>>>>>>> this version. >>>>>>>> The #endif in line 322, parse3.cpp was in the wrong line. >>>>>>>> I also based the webrev on the latest version of the stage repo. >>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Goetz. >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Lindenmaier, Goetz >>>>>>>> Sent: Freitag, 20. Dezember 2013 13:47 >>>>>>>> To: David Holmes >>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> Hi David, >>>>>>>> >>>>>>>>> So we can at least undo #4 now we have established those tests were >>>>>>>>> not >>>>>>>>> required to pass. >>>>>>>> We would prefer if we could keep this in. We want to avoid that it's >>>>>>>> blamed on the VM if java programs are failing on PPC after they >>>>>>>> worked >>>>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard >>>>>>>> it by >>>>>>>> a flag as proposed. But if you insist I will remove it. Also, this >>>>>>>> part is >>>>>>>> not that performance relevant. >>>>>>>> >>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>>> think >>>>>>>> I added a compile-time guard in this new webrev: >>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>>>>>>> several double negations I don't like, (#ifNdef >>>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC) >>>>>>>> but this way I only have to change the ppc platform. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Goetz >>>>>>>> >>>>>>>> P.S.: I will also be available over the Christmas period. >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>> Sent: Freitag, 20. Dezember 2013 05:58 >>>>>>>> To: Lindenmaier, Goetz >>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> Sorry for the delay, it takes a while to catch up after two weeks >>>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check >>>>>>>> emails. >>>>>>>> >>>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> ok, I understand the tests are wrong. It's good this issue is >>>>>>>>> settled. >>>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>>>>>>> >>>>>>>>> About our change: David, the causality is the other way round. >>>>>>>>> The change is about IRIW. >>>>>>>>> 1. To pass IRIW, we must use sync instructions before loads. >>>>>>>> >>>>>>>> This is the part I still have some question marks over as the >>>>>>>> implications are not nice for performance on non-TSO platforms. >>>>>>>> But I'm >>>>>>>> no further along in processing that paper I'm afraid. >>>>>>>> >>>>>>>>> 2. If we do syncs before loads, we don't need to do them after >>>>>>>>> stores. >>>>>>>>> 3. If we don't do them after stores, we fail the volatile >>>>>>>>> constructor tests. >>>>>>>>> 4. So finally we added them again at the end of the constructor >>>>>>>>> after stores >>>>>>>>> to pass the volatile constructor tests. >>>>>>>> >>>>>>>> So we can at least undo #4 now we have established those tests >>>>>>>> were not >>>>>>>> required to pass. >>>>>>>> >>>>>>>>> We originally passed the constructor tests because the ppc memory >>>>>>>>> order >>>>>>>>> instructions are not as find-granular as the >>>>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad. >>>>>>>>> The only instruction >>>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, >>>>>>>>> therefore the >>>>>>>>> MemBarVolatile after the store fixes the constructor tests. The >>>>>>>>> proper representation >>>>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now >>>>>>>>> it's pointless >>>>>>>>> anyways. >>>>>>>>> >>>>>>>>>> I'm not happy with the ifdef approach but I won't block it. >>>>>>>>> I'd be happy to add a property >>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>> >>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>> think >>>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>>>>>>> based not architecture based) as that will allows for turning this >>>>>>>> on/off for any architecture for testing purposes. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>>>> or the like to guard the customization. I'd like that much better. >>>>>>>>> Or also >>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>> >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Goetz. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>> Sent: Donnerstag, 28. November 2013 00:34 >>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> TL;DR version: >>>>>>>>> >>>>>>>>> Discussion on the c-i list has now confirmed that a >>>>>>>>> constructor-barrier >>>>>>>>> for volatiles is not required as part of the JMM specification. It >>>>>>>>> *may* >>>>>>>>> be required in an implementation that doesn't pre-zero memory to >>>>>>>>> ensure >>>>>>>>> you can't see uninitialized fields. So the tests for this are >>>>>>>>> invalid >>>>>>>>> and this part of the patch is not needed in general (ppc64 may >>>>>>>>> need it >>>>>>>>> due to other factors). >>>>>>>>> >>>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the >>>>>>>>> term :) >>>>>>>>> Second thanks for the reference to that paper! For reference: >>>>>>>>> >>>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>>>>>>> complex interconnect) does not guarantee that a write becomes >>>>>>>>> visible to >>>>>>>>> all other hardware threads at the same time point; these >>>>>>>>> architectures >>>>>>>>> are not multiple-copy atomic." >>>>>>>>> >>>>>>>>> This is the visibility issue that I referred to and affects both >>>>>>>>> ARM and >>>>>>>>> PPC. But of course it is normally handled by using suitable barriers >>>>>>>>> after the stores that need to be visible. I think the crux of the >>>>>>>>> current issue is what you wrote below: >>>>>>>>> >>>>>>>>> > The fixes for the constructor issue are only needed because we >>>>>>>>> > remove the sync instruction from behind stores >>>>>>>>> (parse3.cpp:320) >>>>>>>>> > and place it before loads. >>>>>>>>> >>>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync >>>>>>>>> after >>>>>>>>> the store then you have to do something around the loads to get the >>>>>>>>> same >>>>>>>>> results! I still don't know what lead you to the conclusion that the >>>>>>>>> only way to fix the IRIW issue was to put the fence before the >>>>>>>>> load - >>>>>>>>> maybe when I get the chance to read that paper in full it will be >>>>>>>>> clearer. >>>>>>>>> >>>>>>>>> So ... the basic problem is that the current structure in the VM has >>>>>>>>> hard-wired one choice of how to get the right semantics for volatile >>>>>>>>> variables. You now want to customize that but not all the requisite >>>>>>>>> hooks are present. It would be better if volatile_load and >>>>>>>>> volatile_store were factored out so that they could be >>>>>>>>> implemented as >>>>>>>>> desired per-platform. Alternatively there could be pre- and post- >>>>>>>>> hooks >>>>>>>>> that could then be customized per platform. Otherwise you need >>>>>>>>> platform-specific ifdef's to handle it as per your patch. >>>>>>>>> >>>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think >>>>>>>>> this >>>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier >>>>>>>>> abstractions are a confused mess in my opinion. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>> >>>>>>>>>> I did not yet add the >>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>> or >>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>>>> to reduce #defined, as I got no further comment on that. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>>>>>>> I feel not in the position to contribute substantially. >>>>>>>>>> >>>>>>>>>> But we would like to pass the torture test suite as we consider >>>>>>>>>> this a substantial task in implementing a PPC port. Also we think >>>>>>>>>> both tests show behavior a programmer would expect. It's bad if >>>>>>>>>> Java code runs fine on the more common x86 platform, and then >>>>>>>>>> fails on ppc. This will always first be blamed on the VM. >>>>>>>>>> >>>>>>>>>> The fixes for the constructor issue are only needed because we >>>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>>>>>>> and place it before loads. Then there is no sync between volatile >>>>>>>>>> store >>>>>>>>>> and publishing the object. So we add it again in this one case >>>>>>>>>> (volatile store in constructor). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> @David >>>>>>>>>>>> Sure. There also is no solution as you require for the >>>>>>>>>>>> taskqueue problem yet, >>>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>>> continuous. >>>>>>>>>> That's not true, we did a lot of investigation and testing on this >>>>>>>>>> issue. >>>>>>>>>> And we came up with a solution we consider the best possible. If >>>>>>>>>> you >>>>>>>>>> have objections, you should at least give the draft of a better >>>>>>>>>> solution, >>>>>>>>>> we would volunteer to implement and test it. >>>>>>>>>> Similarly, we invested time in fixing the concurrency torture >>>>>>>>>> issues. >>>>>>>>>> >>>>>>>>>> @David >>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term >>>>>>>>>>> and >>>>>>>>>>> can't find any reference to it. >>>>>>>>>> We learned about this reading "A Tutorial Introduction to the >>>>>>>>>> ARM and >>>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient >>>>>>>>>> Work-Stealing for >>>>>>>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the >>>>>>>>>> taskqueue problem. >>>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>>>>>> >>>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I >>>>>>>>>> used 'read' >>>>>>>>>> instead. Sorry for that. (I also fixed that in the method name >>>>>>>>>> above). >>>>>>>>>> >>>>>>>>>> Best regards and thanks for all your involvements, >>>>>>>>>> Goetz. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>> >>>>>>>>>> Hi Goetz, >>>>>>>>>> >>>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>> Hi David, >>>>>>>>>>> >>>>>>>>>>> -- Volatile in constuctor >>>>>>>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>>>>>>> missing constructor barrier. >>>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32 >>>>>>>>>>> processors >>>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks >>>>>>>>>>> Aleksey!) >>>>>>>>>> >>>>>>>>>> And see follow ups - the tests are invalid. >>>>>>>>>> >>>>>>>>>>> -- IRIW issue >>>>>>>>>>>> I can not possibly answer to the necessary level of detail with >>>>>>>>>>>> a few >>>>>>>>>>>> moments thought. >>>>>>>>>>> Sure. There also is no solution as you require for the taskqueue >>>>>>>>>>> problem yet, >>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>> >>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>> continuous. >>>>>>>>>> >>>>>>>>>>>> You are implying there is a problem here that will >>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>>> different?) >>>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore >>>>>>>>>>> I contributed a >>>>>>>>>>> solution with the #defines, and that's correct for all, but not >>>>>>>>>>> nice, I admit. >>>>>>>>>>> (I don't really know about ARM, though). >>>>>>>>>>> So if I can write down a nicer solution testing for methods that >>>>>>>>>>> are evaluated >>>>>>>>>>> by the C-compiler I'm happy. >>>>>>>>>>> >>>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the >>>>>>>>>>> problem >>>>>>>>>>> is that >>>>>>>>>>> store >>>>>>>>>>> sync >>>>>>>>>>> does not assure multiple-read-atomicity, >>>>>>>>>>> only >>>>>>>>>>> sync >>>>>>>>>>> load >>>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>>>>>>> pass that test. >>>>>>>>>> >>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the >>>>>>>>>> term and >>>>>>>>>> can't find any reference to it. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>> The JMM is fine. And >>>>>>>>>>> store >>>>>>>>>>> MemBarVolatile >>>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions >>>>>>>>>>> that >>>>>>>>>>> do what is required. >>>>>>>>>>> >>>>>>>>>>> So if you are off soon, please let's come to a solution that >>>>>>>>>>> might be improvable in the way it's implemented, but that >>>>>>>>>>> allows us to implement a correct PPC64 port. >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Goetz. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; >>>>>>>>>>> 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>> >>>>>>>>>>> Hi Goetz, >>>>>>>>>>> >>>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>>> Hi everybody, >>>>>>>>>>>> >>>>>>>>>>>> thanks a lot for the detailed reviews! >>>>>>>>>>>> I'll try to answer to all in one mail. >>>>>>>>>>>> >>>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM >>>>>>>>>>>>> to occur before the reference is assigned; >>>>>>>>>>>> We don't think it's correct if we omit the barrier after >>>>>>>>>>>> initializing >>>>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey >>>>>>>>>>>> Shipilev >>>>>>>>>>>> and Doug Lea, and they agreed. >>>>>>>>>>>> Also, concurrency torture tests >>>>>>>>>>>> LongVolatileTest >>>>>>>>>>>> AtomicIntegerInitialValueTest >>>>>>>>>>>> will fail. >>>>>>>>>>>> (In addition, observing 0 instead of the inital value of a >>>>>>>>>>>> volatile field would be >>>>>>>>>>>> very counter-intuitive for Java programmers, especially in >>>>>>>>>>>> AtomicInteger.) >>>>>>>>>>> >>>>>>>>>>> The affects of unsafe publication are always surprising - >>>>>>>>>>> volatiles do >>>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM >>>>>>>>>>> that >>>>>>>>>>> requires the constructor barrier - discussions with Doug and >>>>>>>>>>> Aleksey >>>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>>>>>>> missing constructor barrier. >>>>>>>>>>> >>>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely >>>>>>>>>>>>> heavyweight >>>>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We >>>>>>>>>>>> don't >>>>>>>>>>>> see a way to implement this cheaper. >>>>>>>>>>>> >>>>>>>>>>>>> - these algorithms should be expressed using the correct >>>>>>>>>>>>> OrderAccess operations >>>>>>>>>>>> Basically, I agree on this. But you also have to take into >>>>>>>>>>>> account >>>>>>>>>>>> that due to the different memory ordering instructions on >>>>>>>>>>>> different platforms >>>>>>>>>>>> just implementing something empty is not sufficient. >>>>>>>>>>>> An example: >>>>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>>>>>>> If these are consecutively in the code, sparc code looks like >>>>>>>>>>>> this: >>>>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore | >>>>>>>>>>>> Assembler::StoreStore) >>>>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>>>>>>> Just doing what is required. >>>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>>>>>>> fine grained operations: >>>>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore, >>>>>>>>>>>> StoreStore, LoadLoad >>>>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore, >>>>>>>>>>>> StoreStore, LoadLoad, StoreLoad >>>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations >>>>>>>>>>>> are more (too) powerful, >>>>>>>>>>>> I need an additional optimization that removes the lwsync. I >>>>>>>>>>>> can not implement >>>>>>>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>>>>>>> >>>>>>>>>>>> Back to the IRIW problem. I think here we have a comparable >>>>>>>>>>>> issue. >>>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the >>>>>>>>>>>> read >>>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>>>>>>> >>>>>>>>>>>> I would propose to guard the code by >>>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>>>>>>> Else, David, how would you propose to implement this platform >>>>>>>>>>>> independent? >>>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>>>>>>> >>>>>>>>>>> I can not possibly answer to the necessary level of detail with a >>>>>>>>>>> few >>>>>>>>>>> moments thought. You are implying there is a problem here that >>>>>>>>>>> will >>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>> different?) and I can not take that on face value at the >>>>>>>>>>> moment. The >>>>>>>>>>> only reason I can see IRIW not being handled by the JMM >>>>>>>>>>> requirements for >>>>>>>>>>> volatile accesses is if there are global visibility issues that >>>>>>>>>>> are not >>>>>>>>>>> addressed - but even then I would expect heavy barriers at the >>>>>>>>>>> store >>>>>>>>>>> would deal with that, not at the load. (This situation reminds me >>>>>>>>>>> of the >>>>>>>>>>> need for read-barriers on Alpha architecture due to the use of >>>>>>>>>>> software >>>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we >>>>>>>>>>> don't have >>>>>>>>>>> that on ppc!) >>>>>>>>>>> >>>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days >>>>>>>>>>> I will >>>>>>>>>>> be heading out on vacation for two weeks. >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> Goetz. >>>>>>>>>>>> >>>>>>>>>>>> -- Other ports: >>>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so >>>>>>>>>>>> it might >>>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of >>>>>>>>>>>> them). >>>>>>>>>>>> >>>>>>>>>>>> -- MemBarStoreStore after initialization >>>>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I >>>>>>>>>>>> can >>>>>>>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>>>>>>> To: Vladimir Kozlov >>>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>>> >>>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable >>>>>>>>>>>> way :) >>>>>>>>>>>> >>>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>>>>>>> I have to ask David to do correctness evaluation. >>>>>>>>>>>> >>>>>>>>>>>> From what I understand what we see here is an attempt to >>>>>>>>>>>> fix an >>>>>>>>>>>> existing issue with the implementation of volatiles so that the >>>>>>>>>>>> IRIW >>>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when >>>>>>>>>>>> doing the >>>>>>>>>>>> load. >>>>>>>>>>>> >>>>>>>>>>>> Now if this was purely handled in ppc64 source code then I >>>>>>>>>>>> would be >>>>>>>>>>>> happy to let them do whatever they like (surely this kills >>>>>>>>>>>> performance >>>>>>>>>>>> though!). But I do not agree with the changes to the shared code >>>>>>>>>>>> that >>>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY >>>>>>>>>>>> this is >>>>>>>>>>>> polluting the shared code. My concern is similar to what I said >>>>>>>>>>>> with the >>>>>>>>>>>> taskQueue changes - these algorithms should be expressed using >>>>>>>>>>>> the >>>>>>>>>>>> correct OrderAccess operations to guarantee the desired >>>>>>>>>>>> properties >>>>>>>>>>>> independent of architecture. If such a "barrier" is not needed >>>>>>>>>>>> on a >>>>>>>>>>>> given architecture then the implementation in OrderAccess should >>>>>>>>>>>> reduce >>>>>>>>>>>> to a no-op. >>>>>>>>>>>> >>>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed >>>>>>>>>>>> under >>>>>>>>>>>> the JMM. >>>>>>>>>>>> >>>>>>>>>>>>> I am fine with suggested changes because you did not change our >>>>>>>>>>>>> current >>>>>>>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>>>>>>> But may be it should be done using more general query which >>>>>>>>>>>>> is set >>>>>>>>>>>>> depending on platform: >>>>>>>>>>>>> >>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>>>> >>>>>>>>>>>>> or similar to what we use now: >>>>>>>>>>>>> >>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>>> >>>>>>>>>>>> Every platform has to support IRIW this is simply part of the >>>>>>>>>>>> Java >>>>>>>>>>>> Memory Model, there should not be any need to call this out >>>>>>>>>>>> explicitly >>>>>>>>>>>> like this. >>>>>>>>>>>> >>>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are >>>>>>>>>>>> there >>>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM >>>>>>>>>>>> defines? >>>>>>>>>>>>> From what I understand our ppc port is also affected. >>>>>>>>>>>>> David? >>>>>>>>>>>> >>>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>>>>>>> >>>>>>>>>>>> David >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be >>>>>>>>>>>>> inside else {}. >>>>>>>>>>>>> >>>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64 >>>>>>>>>>>>> specific which >>>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need >>>>>>>>>>>>> PPC64_ONLY() >>>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many >>>>>>>>>>>>> #ifdefs. >>>>>>>>>>>>> >>>>>>>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>>>>>>> >>>>>>>>>>>>> if (is_vol) { >>>>>>>>>>>>> // See comment in do_get_xxx(). >>>>>>>>>>>>> #ifndef PPC64 >>>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>>>>>>> #else >>>>>>>>>>>>> if (is_field) { >>>>>>>>>>>>> // Add MemBarRelease for constructors which write >>>>>>>>>>>>> volatile field >>>>>>>>>>>>> (PPC64). >>>>>>>>>>>>> set_wrote_volatile(true); >>>>>>>>>>>>> } >>>>>>>>>>>>> #endif >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Vladimir >>>>>>>>>>>>> >>>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the >>>>>>>>>>>>>> VolatileIRIWTest of >>>>>>>>>>>>>> the torture test suite: >>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> Example: >>>>>>>>>>>>>> volatile x=0, y=0 >>>>>>>>>>>>>> __________ __________ __________ __________ >>>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>>>>>>> >>>>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>>>>>>> read(y) read(x) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This >>>>>>>>>>>>>> is only >>>>>>>>>>>>>> assured by the sync instruction and if it is executed in the >>>>>>>>>>>>>> threads >>>>>>>>>>>>>> doing the loads. Thus we implement volatile read as >>>>>>>>>>>>>> sync-load-acquire >>>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple >>>>>>>>>>>>>> SIGSEGVs >>>>>>>>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Further this change contains a fix that assures that volatile >>>>>>>>>>>>>> fields >>>>>>>>>>>>>> written in constructors are visible before the reference gets >>>>>>>>>>>>>> published. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us, >>>>>>>>>>>>>> seems too >>>>>>>>>>>>>> strong. >>>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should >>>>>>>>>>>>>> suffice. >>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please review and test this change. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>> Goetz. >>>>>>>>>>>>>> From david.holmes at oracle.com Sun Jan 19 18:44:57 2014 From: david.holmes at oracle.com (David Holmes) Date: Mon, 20 Jan 2014 12:44:57 +1000 Subject: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE8D010@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE707E1@DEWDFEMB12A.global.corp.sap> <52B3A3AF.9050609@oracle.com> <52D76E6F.8070504@oracle.com> <52D821CB.6020207@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8D010@DEWDFEMB12A.global.corp.sap> Message-ID: <52DC8DA9.2010106@oracle.com> On 17/01/2014 11:30 PM, Lindenmaier, Goetz wrote: > Hi, > > I had a look at the first part of this issue: Whether StoreStore > is necessary in the interpreter. Let's for now assume the serialization > page mechanism works on PPC. > > In the state transition leaving the VM state, which is executed in the > destructor, ThreadStateTransition::transition() is called, which executes > if (UseMembar) { > OrderAccess::fence(); > } else { > os::write_memory_serialize_page(thread); > } > > os:: write_memory_serialize_page() can not be considered a proper > MemBar, as it only serializes if another thread poisoned the page. > Thus it does not qualify to order the initialization and the publishing > of the object. > > You are right, if UseMembar is true, the StoreStore in the interpreter > is superfluous. We could guard the StoreStores in the interpreter by > !UseMembar. My understanding, from our existing non-TSO system ports, is that the present assumption is that either: a) you have a TSO system, in which case you are probably using the serialization page, but you don't need any barrier to enforce ordering anyway; or b) you don't have a TSO system, you are using UseMembar==true and so you get a full fence inserted that enforces the ordering anyway. So the ordering requirements are satisfied by piggy-backing on the UseMembar setting that comes from the thread state transition code, which forms part of the "runtime entry" code. That's not to say that you will necessarily find this applied consistently in all places where it might be applied - nor will you necessarily find that this is common knowledge amongst VM engineers. Technically the storeStore barriers could be conditional on !UseMembar but that is redundant in the current usage. > But then again, one is to order the publishing of the thread states, > the other to enforce some Java semantics. I don't know whether everybody > who changes in one place is aware of both issues. But if you want to, > I'll add a !UseMembar in the interpreter. Here are my preferred options in order: 1. Set UseMembar==true on PPC64 and drop these new storeStore barriers - rely on the piggy-backing effect. 2. Conditionalize the new storeStore barriers on !UseMembar. This unfortunately penalizes all platforms with a runtime check. 3. Add the storeStores unconditionally. This penalizes platforms that set UseMembar==true as we will now get two fences at runtime. I know we're talking about the interpreter here so performance is not exactly critical, but still ... > Maybe it would be a good idea > to document the double use in interfaceSupport.cpp, too. And maybe > add an assertion of some kind. interfaceSupport doesn't know that other code piggy-backs on the fact state-transitions have full fences when UseMembar is true. If it is documented anywhere it should be in the interpreter (and any other places that makes the same assumption) - something like: // On non-TSO systems there can be additional ordering constraints // between Java-level actions (such as allocation and constructor // invocation) that in principle need explicit memory barriers. // However, on many non-TSO systems the thread-state transition logic // in the IRT_ENTRY code will insert a full fence due to the use of // UseMembar==true, which provides the necessary ordering guarantees. > > We're digging into the other issue currenty, whether the serialization page > works on ppc. We understand your concerns and have no simple answer to > it right now. At least, in our VM and in the port there are no known problems > with the state transitions. Even if the memory serialization page does not work, in a guaranteed sense, on PPC-AIX, it is extremely unlikely that testing would expose this. Also note that the memory serialization page behaviour is more a function of the OS - so it may be that AIX is different to linux in that regard. Cheers, David ----- > Best regards, > Goetz. > > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Donnerstag, 16. Januar 2014 19:16 > To: David Holmes; Lindenmaier, Goetz > Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev Source Developers' > Subject: Re: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization > > Changes are in C++ Interpreter so it does not affect Oracle VM. > But David has point here. I would like to hear the explanation too. > > BTW, I see that for ppc64: > > src/cpu/ppc/vm//globals_ppc.hpp:define_pd_global(bool, UseMembar, false); > > as result write_memory_serialize_page() is used in > ThreadStateTransition::transition(). > > Is it not enough on PPC64? > > Thanks, > Vladimir > > On 1/15/14 9:30 PM, David Holmes wrote: >> Can I get some response on this please - specifically the redundancy wrt >> IRT_ENTRY actions. >> >> Thanks, >> David >> >> On 20/12/2013 11:55 AM, David Holmes wrote: >>> Still catching up ... >>> >>> On 11/12/2013 9:46 PM, Lindenmaier, Goetz wrote: >>>> Hi, >>>> >>>> this change adds StoreStore barriers after object initialization and >>>> after constructor calls in the C++ interpreter. This assures no >>>> uninitialized >>>> objects or final fields are visible. >>>> http://cr.openjdk.java.net/~goetz/webrevs/8029957-0-moci/ >>> >>> The InterpreterRuntime calls are all IRT_ENTRY points which will utilize >>> thread state transitions that already include a full "fence" so the >>> storestore barriers are redundant in those cases. >>> >>> The fastpath _new storestore seems okay. >>> >>> I don't know how handle_return gets used to know if it is reasonable or >>> not. >>> >>> I was trying, unsuccessfully, to examine the same code in the >>> templateInterpreter to see how it handles these cases as it naturally >>> has the same object-initialization-safety requirements (though these can >>> be handled in a number of different ways other than an unconditional >>> storestore barrier at the end of the initialization and construction >>> phases. >>> >>> David >>> ----- >>> >>>> Please review and test this change. >>>> >>>> Best regards, >>>> Goetz. >>>> From volker.simonis at gmail.com Mon Jan 20 00:23:35 2014 From: volker.simonis at gmail.com (volker.simonis at gmail.com) Date: Mon, 20 Jan 2014 08:23:35 +0000 Subject: hg: ppc-aix-port/stage-9/jdk: 8031134: PPC64: implement printing on AIX Message-ID: <20140120082348.0329A625A6@hg.openjdk.java.net> Changeset: e11c60a3eefb Author: simonis Date: 2014-01-20 09:20 +0100 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/jdk/rev/e11c60a3eefb 8031134: PPC64: implement printing on AIX Reviewed-by: prr ! src/solaris/classes/sun/print/UnixPrintService.java ! src/solaris/classes/sun/print/UnixPrintServiceLookup.java From volker.simonis at gmail.com Mon Jan 20 00:27:05 2014 From: volker.simonis at gmail.com (volker.simonis at gmail.com) Date: Mon, 20 Jan 2014 08:27:05 +0000 Subject: hg: ppc-aix-port/stage-9/jdk: 8031997: PPC64: Make the various POLL constants system dependant Message-ID: <20140120082718.8F3FF625A7@hg.openjdk.java.net> Changeset: 4231d71b18cf Author: simonis Date: 2014-01-20 09:24 +0100 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/jdk/rev/4231d71b18cf 8031997: PPC64: Make the various POLL constants system dependant Reviewed-by: alanb ! make/mapfiles/libnio/mapfile-linux ! make/mapfiles/libnio/mapfile-macosx ! make/mapfiles/libnio/mapfile-solaris ! src/aix/classes/sun/nio/ch/AixPollPort.java ! src/macosx/classes/sun/nio/ch/KQueueArrayWrapper.java ! src/share/classes/sun/nio/ch/AbstractPollArrayWrapper.java ! src/share/classes/sun/nio/ch/DatagramChannelImpl.java ! src/share/classes/sun/nio/ch/DatagramSocketAdaptor.java ! src/share/classes/sun/nio/ch/Net.java ! src/share/classes/sun/nio/ch/ServerSocketAdaptor.java ! src/share/classes/sun/nio/ch/ServerSocketChannelImpl.java ! src/share/classes/sun/nio/ch/SocketAdaptor.java ! src/share/classes/sun/nio/ch/SocketChannelImpl.java ! src/solaris/classes/sun/nio/ch/EPollPort.java ! src/solaris/classes/sun/nio/ch/KQueuePort.java ! src/solaris/classes/sun/nio/ch/PollArrayWrapper.java ! src/solaris/classes/sun/nio/ch/Port.java ! src/solaris/classes/sun/nio/ch/SinkChannelImpl.java ! src/solaris/classes/sun/nio/ch/SourceChannelImpl.java ! src/solaris/classes/sun/nio/ch/UnixAsynchronousServerSocketChannelImpl.java ! src/solaris/classes/sun/nio/ch/UnixAsynchronousSocketChannelImpl.java ! src/solaris/classes/sun/nio/ch/sctp/SctpChannelImpl.java ! src/solaris/classes/sun/nio/ch/sctp/SctpMultiChannelImpl.java ! src/solaris/classes/sun/nio/ch/sctp/SctpServerChannelImpl.java ! src/solaris/native/sun/nio/ch/IOUtil.c ! src/solaris/native/sun/nio/ch/Net.c ! src/windows/classes/sun/nio/ch/PollArrayWrapper.java ! src/windows/classes/sun/nio/ch/SinkChannelImpl.java ! src/windows/classes/sun/nio/ch/SourceChannelImpl.java ! src/windows/classes/sun/nio/ch/WindowsSelectorImpl.java ! src/windows/native/sun/nio/ch/Net.c ! src/windows/native/sun/nio/ch/WindowsSelectorImpl.c ! src/windows/native/sun/nio/ch/nio_util.h From volker.simonis at gmail.com Mon Jan 20 00:18:33 2014 From: volker.simonis at gmail.com (volker.simonis at gmail.com) Date: Mon, 20 Jan 2014 08:18:33 +0000 Subject: hg: ppc-aix-port/stage-9/jdk: 8028537: PPC64: Updated the JDK regression tests to run on AIX Message-ID: <20140120081918.D1D1D625A4@hg.openjdk.java.net> Changeset: f04b825b1c0c Author: simonis Date: 2014-01-17 21:54 +0100 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/jdk/rev/f04b825b1c0c 8028537: PPC64: Updated the JDK regression tests to run on AIX Reviewed-by: alanb Contributed-by: luchsh at linux.vnet.ibm.com, spoole at linux.vnet.ibm.com, volker.simonis at gmail.com ! test/ProblemList.txt ! test/com/sun/corba/5036554/TestCorbaBug.sh ! test/com/sun/corba/cachedSocket/7056731.sh ! test/com/sun/java/swing/plaf/windows/8016551/bug8016551.java ! test/com/sun/jdi/ImmutableResourceTest.sh ! test/com/sun/jdi/JITDebug.sh ! test/com/sun/jdi/PrivateTransportTest.sh ! test/com/sun/jdi/ShellScaffold.sh ! test/com/sun/jdi/connect/spi/JdiLoadedByCustomLoader.sh ! test/java/awt/Toolkit/AutoShutdown/ShowExitTest/ShowExitTest.sh ! test/java/awt/appletviewer/IOExceptionIfEncodedURLTest/IOExceptionIfEncodedURLTest.sh ! test/java/io/Serializable/evolution/RenamePackage/run.sh ! test/java/io/Serializable/serialver/classpath/run.sh ! test/java/io/Serializable/serialver/nested/run.sh ! test/java/lang/ClassLoader/deadlock/TestCrossDelegate.sh ! test/java/lang/ClassLoader/deadlock/TestOneWayDelegate.sh ! test/java/lang/StringCoding/CheckEncodings.sh ! test/java/lang/annotation/loaderLeak/LoaderLeak.sh ! test/java/lang/instrument/appendToClassLoaderSearch/CommonSetup.sh ! test/java/lang/management/OperatingSystemMXBean/TestSystemLoadAvg.sh ! test/java/net/Authenticator/B4933582.sh ! test/java/net/DatagramSocket/Send12k.java ! test/java/net/DatagramSocket/SetDatagramSocketImplFactory/ADatagramSocket.sh ! test/java/net/Socket/OldSocketImpl.sh ! test/java/net/URL/B5086147.sh ! test/java/net/URLClassLoader/B5077773.sh ! test/java/net/URLClassLoader/sealing/checksealed.sh ! test/java/net/URLConnection/6212146/test.sh ! test/java/nio/charset/coders/CheckSJISMappingProp.sh ! test/java/nio/charset/spi/basic.sh ! test/java/nio/file/Files/SBC.java ! test/java/nio/file/Files/walkFileTree/find.sh ! test/java/rmi/activation/Activatable/extLoadedImpl/ext.sh ! test/java/rmi/registry/readTest/readTest.sh ! test/java/security/Security/ClassLoaderDeadlock/ClassLoaderDeadlock.sh ! test/java/security/Security/ClassLoaderDeadlock/Deadlock.sh ! test/java/security/Security/ClassLoaderDeadlock/Deadlock2.sh ! test/java/security/Security/signedfirst/Dyn.sh ! test/java/security/Security/signedfirst/Static.sh ! test/java/util/Currency/PropertiesTest.sh ! test/java/util/Locale/LocaleCategory.sh ! test/java/util/Locale/LocaleProviders.sh ! test/java/util/PluggableLocale/ExecTest.sh ! test/java/util/ResourceBundle/Bug6299235Test.sh ! test/java/util/ServiceLoader/basic.sh ! test/java/util/logging/AnonLoggerWeakRefLeak.sh ! test/java/util/logging/LoggerWeakRefLeak.sh ! test/java/util/prefs/CheckUserPrefsStorage.sh ! test/javax/crypto/SecretKeyFactory/FailOverTest.sh ! test/javax/imageio/metadata/IIOMetadataFormat/runMetadataFormatTest.sh ! test/javax/imageio/metadata/IIOMetadataFormat/runMetadataFormatThreadTest.sh ! test/javax/imageio/stream/StreamCloserLeak/run_test.sh ! test/javax/script/CommonSetup.sh ! test/javax/security/auth/Subject/doAs/Test.sh ! test/lib/security/java.policy/Ext_AllPolicy.sh ! test/sun/management/jmxremote/bootstrap/GeneratePropertyPassword.sh ! test/sun/net/ftp/MarkResetTest.sh ! test/sun/net/www/http/HttpClient/RetryPost.sh ! test/sun/net/www/protocol/jar/B5105410.sh ! test/sun/net/www/protocol/jar/jarbug/run.sh ! test/sun/rmi/rmic/newrmic/equivalence/batch.sh ! test/sun/security/krb5/runNameEquals.sh ! test/sun/security/pkcs11/Provider/ConfigQuotedString.sh ! test/sun/security/pkcs11/Provider/Login.sh ! test/sun/security/provider/KeyStore/DKSTest.sh ! test/sun/security/provider/PolicyFile/getinstance/getinstance.sh ! test/sun/security/ssl/com/sun/net/ssl/internal/ssl/EngineArgs/DebugReportsOneExtraByte.sh ! test/sun/security/ssl/com/sun/net/ssl/internal/ssl/SSLSocketImpl/NotifyHandshakeTest.sh ! test/sun/security/ssl/sun/net/www/protocol/https/HttpsURLConnection/PostThruProxy.sh ! test/sun/security/ssl/sun/net/www/protocol/https/HttpsURLConnection/PostThruProxyWithAuth.sh ! test/sun/security/tools/jarsigner/AlgOptions.sh ! test/sun/security/tools/jarsigner/PercentSign.sh ! test/sun/security/tools/jarsigner/diffend.sh ! test/sun/security/tools/jarsigner/oldsig.sh ! test/sun/security/tools/keytool/AltProviderPath.sh ! test/sun/security/tools/keytool/CloneKeyAskPassword.sh ! test/sun/security/tools/keytool/NoExtNPE.sh ! test/sun/security/tools/keytool/SecretKeyKS.sh ! test/sun/security/tools/keytool/StandardAlgName.sh ! test/sun/security/tools/keytool/StorePasswordsByShell.sh ! test/sun/security/tools/keytool/printssl.sh ! test/sun/security/tools/keytool/resource.sh ! test/sun/security/tools/keytool/standard.sh ! test/sun/security/tools/policytool/Alias.sh ! test/sun/security/tools/policytool/ChangeUI.sh ! test/sun/security/tools/policytool/OpenPolicy.sh ! test/sun/security/tools/policytool/SaveAs.sh ! test/sun/security/tools/policytool/UpdatePermissions.sh ! test/sun/security/tools/policytool/UsePolicy.sh ! test/sun/security/tools/policytool/i18n.sh ! test/sun/tools/common/CommonSetup.sh ! test/sun/tools/jconsole/ResourceCheckTest.sh ! test/sun/tools/jinfo/Basic.sh ! test/sun/tools/native2ascii/resources/ImmutableResourceTest.sh ! test/tools/launcher/ExecutionEnvironment.java ! test/tools/launcher/Settings.java ! test/tools/launcher/TestHelper.java From volker.simonis at gmail.com Mon Jan 20 01:59:13 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 20 Jan 2014 10:59:13 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: <52D50118.3080000@oracle.com> Message-ID: On Fri, Jan 17, 2014 at 10:15 PM, Volker Simonis wrote: > On Tue, Jan 14, 2014 at 10:19 AM, Alan Bateman wrote: >> On 14/01/2014 08:40, Volker Simonis wrote: >>> >>> Hi, >>> >>> could you please review the following changes for the ppc-aix-port >>> stage/stage-9 repositories (the changes are planned for integration into >>> ppc-aix-port/stage-9 and subsequent backporting to ppc-aix-port/stage): >> >> I'd like to review this but I won't have time until later in the week. From >> an initial look then there are a few things are not pretty (the changes to >> fix the AIX problems with I/O cancellation in particular) and I suspect that >> some refactoring is going to be required to handle some of this cleanly. A >> minor comment is that bug synopsis doesn't really communicate what these >> changes are about. >> >> -Alan. > > Just forwarded the following message from another thread here where it belongs: > > On 17/01/2014 16:57, Alan Bateman wrote: > > I've finally got to this one. As the event translation issue is now a > separate issue then I've ignored that part. > > I'm not comfortable with the changes to FileDispatcherImpl.c as I > don't think we shouldn't be calling into IO_ or NET_* functions here. > I think I get the issue that you have on AIX (and assume it's the > preClose/dup2 that blocks rather than close) but need a bit of time to > suggest alternatives. It may be that it will require an AIX specific > SocketDispatcher. Do you happen to know which tests fail due to this > part? > > The other changes look okay. There is a typo in the change to > zip_util.c, s/legel/legal/. > > In DatagramChannelImpl.c then you handle connect failing with > EAFNOSUPPORT. I would be tempted to replace the comment to say that it > EAFNOSUPPORT can be ignored on AIX. A minor comment but the > indentation for rv = errno can be fixed (I see the BSD code has it > wrong too). > On 17/01/2014 21:23, Volker Simonis wrote: > > > You're right, one race is with preClose/dup2 but also with other calls > > like read/fcntl/... > > > > There were several tests that failed and once I fixed it they all > > succeeded. But I can recreate some of the failures for you. The > > symptoms are always the same: the VMis locked. If you trigger a stack > > trace you can see that at least on thread is blocked in a I/O > > operation on a file descriptor like fcntl (e.g. for file locking), > > read, etc. while another thread is trying to close that socket. > > > > As it happens, we have some carry over issues from the Mac port, > one of which is that async close of FileChannels will block > indefinitely in dup2 when there is another thread blocked (on > fnctl or reading from a pipe ...). I haven't time time to work on > it but this discussion has reminded me that we need to sort it > out. I've put a preliminary webrev with the changes here: > > http://cr.openjdk.java.net/~alanb/7133499/webrev/ > > The important part is that it's using signal consistently on > Linux/Solaris/OSX so that any blocked threads are interrupted. My > guess is that if NativeThread.c is updated to define a signal on > AIX they this should resolve some of the issues on AIX. > > I would like to see the list of tests failing. If there is an > issue with dup2 with sockets (and OS X doesn't seem to have that > issue) then it will require further work but I would at least > like to start by understanding if this patch will help with the > FileChannel issues. Hi Alan, yes, that's interesting. Sounds like a very similar problem on Mac. I would suggest the following: I cut out the "Async Close AIX FIX" stuff from this change (i.e. "8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests" and send out a new webrev for the remaining part. I think that the remaining part was more or less reviewed and we can then push it faster. In the mean time, I'll recheck which tests exactly fail with my missing "Async Close AIX FIX" stuff and which of these tests will be fixed by your 7133499 webrev. Maybe we can really get trough with it or with it and a few enhancements. I'll let you know my results later today. By the way, my webrev already contained a AixNativeThread.c implementation in src/aix/native/sun/nio/ch. The only remaining problem I see with this approach is that we would need to downport your 7133499 change to 8u-dev in the 8u20 time frame to make our AIX port work. Would this be OK for you? Regards, Volker From Alan.Bateman at oracle.com Mon Jan 20 03:41:48 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 20 Jan 2014 11:41:48 +0000 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: <52D50118.3080000@oracle.com> Message-ID: <52DD0B7C.4090605@oracle.com> On 20/01/2014 09:59, Volker Simonis wrote: > : > Hi Alan, > > yes, that's interesting. Sounds like a very similar problem on Mac. > > I would suggest the following: > > I cut out the "Async Close AIX FIX" stuff from this change (i.e. > "8031581: PPC64: Addons and fixes for AIX to pass the jdk regression > tests" and send out a new webrev for the remaining part. I think that > the remaining part was more or less reviewed and we can then push it > faster. > > In the mean time, I'll recheck which tests exactly fail with my > missing "Async Close AIX FIX" stuff and which of these tests will be > fixed by your 7133499 webrev. Maybe we can really get trough with it > or with it and a few enhancements. I'll let you know my results later > today. By the way, my webrev already contained a AixNativeThread.c > implementation in src/aix/native/sun/nio/ch. > > The only remaining problem I see with this approach is that we would > need to downport your 7133499 change to 8u-dev in the 8u20 time frame > to make our AIX port work. Would this be OK for you? > I'm okay with this plan and if you re-generate the webrev without the async close changes then I can look at it quickly so that you can get it into the stage-9 forest. On 7133499 then it would be a good candidate for 8u-dev too, I don't expect any problems but we will need to get it approved on the jdk8u-dev list. -Alan. From goetz.lindenmaier at sap.com Mon Jan 20 05:26:26 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 20 Jan 2014 13:26:26 +0000 Subject: serialization page mechanism on PPC Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8D91E@DEWDFEMB12A.global.corp.sap> Hi Tiago, I have a question regarding the linux implementation on PPC. HotSpot contains a mechanism that forces a thread to enforce memory ordering by using a "serialization page". The idea is that thread t1 who rarely changes a certain field forces another thread t2 to update it's memory state by poisoning a page. I.e.: t1 sets a shared page called serialization page to "READ" and then again to "RW". t2 does a write on this page. If a page fault occurs after t1 changed the state, and assuming that handling this fault forces a memory serialization, this works as a conditional "sync" instruction executed in t2. We use this mechanism in our VM for a long time without problems, but never explored the background of this trick. Now it was questioned while discussing the OpenJDK PPC changes. Do you know whether this works on PPC as it does on x86 etc? Or do you know somebody who can answer this question? Basically it drills down to two questions: - Will t2 execute some os-operation on the write after t1 changed the access rights? - Does this operation force a sync or ptwsync or equivalent? Actually, we also would like to know whether this works on AIX. Best regards, Goetz. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20140120/75b028e7/attachment.html From goetz.lindenmaier at sap.com Mon Jan 20 05:41:26 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 20 Jan 2014 13:41:26 +0000 Subject: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization In-Reply-To: <52DC8DA9.2010106@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE707E1@DEWDFEMB12A.global.corp.sap> <52B3A3AF.9050609@oracle.com> <52D76E6F.8070504@oracle.com> <52D821CB.6020207@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8D010@DEWDFEMB12A.global.corp.sap> <52DC8DA9.2010106@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8D974@DEWDFEMB12A.global.corp.sap> Hi David, I understand your arguments and basically agree with them. If the serialization page does not work on PPC, your solution 1) is best. But I'm not sure why there should be a link between TSO and whether the serialization page trick works. Second depends, as you say, on the OS implementation. I would assume that the write to the serialization page causes the OS to generate a new TLB entry or the like, involving the use of a ptwsync instruction which would be fine. But we are investigating this. We are also experimenting with a small regression test to find out whether the serialization page ever fails. Best regards, Goetz. -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Montag, 20. Januar 2014 03:45 To: Lindenmaier, Goetz; Vladimir Kozlov Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev Source Developers' Subject: Re: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization On 17/01/2014 11:30 PM, Lindenmaier, Goetz wrote: > Hi, > > I had a look at the first part of this issue: Whether StoreStore > is necessary in the interpreter. Let's for now assume the serialization > page mechanism works on PPC. > > In the state transition leaving the VM state, which is executed in the > destructor, ThreadStateTransition::transition() is called, which executes > if (UseMembar) { > OrderAccess::fence(); > } else { > os::write_memory_serialize_page(thread); > } > > os:: write_memory_serialize_page() can not be considered a proper > MemBar, as it only serializes if another thread poisoned the page. > Thus it does not qualify to order the initialization and the publishing > of the object. > > You are right, if UseMembar is true, the StoreStore in the interpreter > is superfluous. We could guard the StoreStores in the interpreter by > !UseMembar. My understanding, from our existing non-TSO system ports, is that the present assumption is that either: a) you have a TSO system, in which case you are probably using the serialization page, but you don't need any barrier to enforce ordering anyway; or b) you don't have a TSO system, you are using UseMembar==true and so you get a full fence inserted that enforces the ordering anyway. So the ordering requirements are satisfied by piggy-backing on the UseMembar setting that comes from the thread state transition code, which forms part of the "runtime entry" code. That's not to say that you will necessarily find this applied consistently in all places where it might be applied - nor will you necessarily find that this is common knowledge amongst VM engineers. Technically the storeStore barriers could be conditional on !UseMembar but that is redundant in the current usage. > But then again, one is to order the publishing of the thread states, > the other to enforce some Java semantics. I don't know whether everybody > who changes in one place is aware of both issues. But if you want to, > I'll add a !UseMembar in the interpreter. Here are my preferred options in order: 1. Set UseMembar==true on PPC64 and drop these new storeStore barriers - rely on the piggy-backing effect. 2. Conditionalize the new storeStore barriers on !UseMembar. This unfortunately penalizes all platforms with a runtime check. 3. Add the storeStores unconditionally. This penalizes platforms that set UseMembar==true as we will now get two fences at runtime. I know we're talking about the interpreter here so performance is not exactly critical, but still ... > Maybe it would be a good idea > to document the double use in interfaceSupport.cpp, too. And maybe > add an assertion of some kind. interfaceSupport doesn't know that other code piggy-backs on the fact state-transitions have full fences when UseMembar is true. If it is documented anywhere it should be in the interpreter (and any other places that makes the same assumption) - something like: // On non-TSO systems there can be additional ordering constraints // between Java-level actions (such as allocation and constructor // invocation) that in principle need explicit memory barriers. // However, on many non-TSO systems the thread-state transition logic // in the IRT_ENTRY code will insert a full fence due to the use of // UseMembar==true, which provides the necessary ordering guarantees. > > We're digging into the other issue currenty, whether the serialization page > works on ppc. We understand your concerns and have no simple answer to > it right now. At least, in our VM and in the port there are no known problems > with the state transitions. Even if the memory serialization page does not work, in a guaranteed sense, on PPC-AIX, it is extremely unlikely that testing would expose this. Also note that the memory serialization page behaviour is more a function of the OS - so it may be that AIX is different to linux in that regard. Cheers, David ----- > Best regards, > Goetz. > > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Donnerstag, 16. Januar 2014 19:16 > To: David Holmes; Lindenmaier, Goetz > Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev Source Developers' > Subject: Re: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization > > Changes are in C++ Interpreter so it does not affect Oracle VM. > But David has point here. I would like to hear the explanation too. > > BTW, I see that for ppc64: > > src/cpu/ppc/vm//globals_ppc.hpp:define_pd_global(bool, UseMembar, false); > > as result write_memory_serialize_page() is used in > ThreadStateTransition::transition(). > > Is it not enough on PPC64? > > Thanks, > Vladimir > > On 1/15/14 9:30 PM, David Holmes wrote: >> Can I get some response on this please - specifically the redundancy wrt >> IRT_ENTRY actions. >> >> Thanks, >> David >> >> On 20/12/2013 11:55 AM, David Holmes wrote: >>> Still catching up ... >>> >>> On 11/12/2013 9:46 PM, Lindenmaier, Goetz wrote: >>>> Hi, >>>> >>>> this change adds StoreStore barriers after object initialization and >>>> after constructor calls in the C++ interpreter. This assures no >>>> uninitialized >>>> objects or final fields are visible. >>>> http://cr.openjdk.java.net/~goetz/webrevs/8029957-0-moci/ >>> >>> The InterpreterRuntime calls are all IRT_ENTRY points which will utilize >>> thread state transitions that already include a full "fence" so the >>> storestore barriers are redundant in those cases. >>> >>> The fastpath _new storestore seems okay. >>> >>> I don't know how handle_return gets used to know if it is reasonable or >>> not. >>> >>> I was trying, unsuccessfully, to examine the same code in the >>> templateInterpreter to see how it handles these cases as it naturally >>> has the same object-initialization-safety requirements (though these can >>> be handled in a number of different ways other than an unconditional >>> storestore barrier at the end of the initialization and construction >>> phases. >>> >>> David >>> ----- >>> >>>> Please review and test this change. >>>> >>>> Best regards, >>>> Goetz. >>>> From volker.simonis at gmail.com Mon Jan 20 05:45:12 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 20 Jan 2014 14:45:12 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: <52DD0B7C.4090605@oracle.com> References: <52D50118.3080000@oracle.com> <52DD0B7C.4090605@oracle.com> Message-ID: On Mon, Jan 20, 2014 at 12:41 PM, Alan Bateman wrote: > On 20/01/2014 09:59, Volker Simonis wrote: >> >> : >> Hi Alan, >> >> yes, that's interesting. Sounds like a very similar problem on Mac. >> >> I would suggest the following: >> >> I cut out the "Async Close AIX FIX" stuff from this change (i.e. >> "8031581: PPC64: Addons and fixes for AIX to pass the jdk regression >> tests" and send out a new webrev for the remaining part. I think that >> the remaining part was more or less reviewed and we can then push it >> faster. >> >> In the mean time, I'll recheck which tests exactly fail with my >> missing "Async Close AIX FIX" stuff and which of these tests will be >> fixed by your 7133499 webrev. Maybe we can really get trough with it >> or with it and a few enhancements. I'll let you know my results later >> today. By the way, my webrev already contained a AixNativeThread.c >> implementation in src/aix/native/sun/nio/ch. >> >> The only remaining problem I see with this approach is that we would >> need to downport your 7133499 change to 8u-dev in the 8u20 time frame >> to make our AIX port work. Would this be OK for you? >> > I'm okay with this plan and if you re-generate the webrev without the async > close changes then I can look at it quickly so that you can get it into the > stage-9 forest. > > On 7133499 then it would be a good candidate for 8u-dev too, I don't expect > any problems but we will need to get it approved on the jdk8u-dev list. > > -Alan. Hi everybody, so here's the second version of this webrev: http://cr.openjdk.java.net/~simonis/webrevs/8031581_2/ The main changes compared to the first webrew are as follows: - the POLL-constants related stuff has been factored out into its own webrev ("8031997: PPC64: Make the various POLL constants system dependant" - https://bugs.openjdk.java.net/browse/JDK-8031997). - the "Async close on AIX" workarounds have been taken out as well and will be handled separately (probably together with Alans fix for http://cr.openjdk.java.net/~alanb/7133499/webrev/). - in the remaining files I've applied the changes suggested by Staffan, so I think the changes to the following files can be considered as reviewed: src/share/native/sun/management/DiagnosticCommandImpl.c src/solaris/native/sun/management/OperatingSystemImpl.c src/share/transport/socket/socketTransport.c src/share/classes/sun/tools/attach/META-INF/services/com.sun.tools.attach.spi.AttachProvider - I've added the following additional files to the change: src/aix/classes/sun/nio/ch/sctp/SctpChannelImpl.java src/aix/classes/sun/nio/ch/sctp/SctpMultiChannelImpl.java src/aix/classes/sun/nio/ch/sctp/SctpServerChannelImpl.java which are just empty stub implementations of the SCTP classes needed to pass the SCTP jtreg tests. All other changes should be the same like in the first review round. Thanks, Volker From Alan.Bateman at oracle.com Mon Jan 20 07:24:01 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 20 Jan 2014 15:24:01 +0000 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: <52D50118.3080000@oracle.com> <52DD0B7C.4090605@oracle.com> Message-ID: <52DD3F91.7020202@oracle.com> On 20/01/2014 13:45, Volker Simonis wrote: > : > Hi everybody, > > so here's the second version of this webrev: > > http://cr.openjdk.java.net/~simonis/webrevs/8031581_2/ This looks okay to me. The typo ("legel" -> "legal") still exists in zip_util.c and maybe that can be fixed before you push this (no need to generate a few webrev of course). For the JDWP socket transport then it's interesting that shutdown is being used to cause the reader thread to be preempted. That may be useful when it comes to addressing the bigger async close issue. > > The main changes compared to the first webrew are as follows: > > - the POLL-constants related stuff has been factored out into its own > webrev ("8031997: PPC64: Make the various POLL constants system > dependant" - https://bugs.openjdk.java.net/browse/JDK-8031997). I see this has been pushed to ppc-aix-port/stage-9. Would you have any objection if I brought this into jdk9/dev (minus the AixPollPort change)? We can use a different bug number so as not to cause duplicate bug issues. It should trivially merge when you come to sync'ing up the staging forest. > - the "Async close on AIX" workarounds have been taken out as well > and will be handled separately Thanks for separating this one out as I suspect this that doing this cleanly is going to involve changes for all platforms. -Alan. From volker.simonis at gmail.com Mon Jan 20 08:29:13 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 20 Jan 2014 17:29:13 +0100 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: <52DD3F91.7020202@oracle.com> References: <52D50118.3080000@oracle.com> <52DD0B7C.4090605@oracle.com> <52DD3F91.7020202@oracle.com> Message-ID: On Mon, Jan 20, 2014 at 4:24 PM, Alan Bateman wrote: > On 20/01/2014 13:45, Volker Simonis wrote: >> >> : >> Hi everybody, >> >> so here's the second version of this webrev: >> >> http://cr.openjdk.java.net/~simonis/webrevs/8031581_2/ > > This looks okay to me. Thanks. > > The typo ("legel" -> "legal") still exists in zip_util.c and maybe that can > be fixed before you push this (no need to generate a few webrev of course). > Sorry, I've just fixed it in my patch queue and will used the fixed version for pushing. @Vladimir: could you please run this change (http://cr.openjdk.java.net/~simonis/webrevs/8031581_2) through JPRT as well. I'll push it (together with the fixed typo in the comment) if everything is OK. > For the JDWP socket transport then it's interesting that shutdown is being > used to cause the reader thread to be preempted. That may be useful when it > comes to addressing the bigger async close issue. > > >> >> The main changes compared to the first webrew are as follows: >> >> - the POLL-constants related stuff has been factored out into its own >> webrev ("8031997: PPC64: Make the various POLL constants system >> dependant" - https://bugs.openjdk.java.net/browse/JDK-8031997). > > I see this has been pushed to ppc-aix-port/stage-9. Would you have any > objection if I brought this into jdk9/dev (minus the AixPollPort change)? We > can use a different bug number so as not to cause duplicate bug issues. It > should trivially merge when you come to sync'ing up the staging forest. > I have no objections of course. I'm just not sure what exact implications this will have. @Vladimir: what do you think - can Alan push "8031997: PPC64: Make the various POLL constants system dependant" minus the Aix-specific stuff to jdk9/dev now, without causing you any harm during integration. @Alan: on the other hand, the bulk integration from ppc-aix-port/stage-9 to jdk9/dev is planned for next week anyway, so maybe you could wait until that happens? Thanks, Volker > >> - the "Async close on AIX" workarounds have been taken out as well >> and will be handled separately > > Thanks for separating this one out as I suspect this that doing this cleanly > is going to involve changes for all platforms. > > -Alan. From Alan.Bateman at oracle.com Mon Jan 20 08:42:39 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 20 Jan 2014 16:42:39 +0000 Subject: RFR(L): 8031581: PPC64: Addons and fixes for AIX to pass the jdk regression tests In-Reply-To: References: <52D50118.3080000@oracle.com> <52DD0B7C.4090605@oracle.com> <52DD3F91.7020202@oracle.com> Message-ID: <52DD51FF.7060502@oracle.com> On 20/01/2014 16:29, Volker Simonis wrote: > : > > @Alan: on the other hand, the bulk integration from > ppc-aix-port/stage-9 to jdk9/dev is planned for next week anyway, so > maybe you could wait until that happens? > In that case then ignore my request, I assumed it would not be pushed to jdk9/dev until end-Feb. -Alan From volker.simonis at gmail.com Mon Jan 20 11:57:03 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 20 Jan 2014 20:57:03 +0100 Subject: 7133499: (fc) FileChannel.read not preempted by asynchronous close on OS X In-Reply-To: <52DD10A1.3040202@oracle.com> References: <52DD10A1.3040202@oracle.com> Message-ID: Hi Alan, I've tried your patch with our port on AIX. The good news is that it fixes: java/nio/channels/AsynchronousFileChannel/Lock.java on AIX as well. The bad news is, that it doesn't seem to help for: java/nio/channels/AsyncCloseAndInterrupt.java Here's a stack trace of where the VM gets stuck: TestThread-FileChannel/transferTo/interrupt" #12 daemon prio=5 os_prio=57 tid=0x0000000119c07800 nid=0x1310131 runnable [0x000000011 a1e3000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:51) at sun.nio.ch.SinkChannelImpl.write(SinkChannelImpl.java:167) - locked <0x0a000100255c9330> (a java.lang.Object) at sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:468) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:564) at AsyncCloseAndInterrupt$18.doIO(AsyncCloseAndInterrupt.java:391) at AsyncCloseAndInterrupt$Tester.go(AsyncCloseAndInterrupt.java:485) at TestThread.run(TestThread.java:55) "MainThread" #9 prio=5 os_prio=57 tid=0x0000000119289800 nid=0x2d50115 runnable [0x0000000119497000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.preClose0(Native Method) at sun.nio.ch.FileDispatcherImpl.preClose(FileDispatcherImpl.java:102) at sun.nio.ch.SinkChannelImpl.implCloseSelectableChannel(SinkChannelImpl.java:88) - locked <0x0a000100255c9340> (a java.lang.Object) at java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:234) at java.nio.channels.spi.AbstractInterruptibleChannel$1.interrupt(AbstractInterruptibleChannel.java:165) - locked <0x0a000100255c9300> (a java.lang.Object) at java.lang.Thread.interrupt(Thread.java:918) - locked <0x0a000100255ceca0> (a java.lang.Object) at AsyncCloseAndInterrupt.test(AsyncCloseAndInterrupt.java:573) at AsyncCloseAndInterrupt.test(AsyncCloseAndInterrupt.java:609) at AsyncCloseAndInterrupt.main(AsyncCloseAndInterrupt.java:680) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.sun.javatest.regtest.MainWrapper$MainThread.run(MainWrapper.java:94) at java.lang.Thread.run(Thread.java:744) As you can see, it hangs in preclose, because it is also blocked in write. I think here is where calling the interruptible write in my initial change helped. But now, after I saw your solution here for 7133499 I wonder if the same technique you applied in FileChannelImpl.implCloseChannel() wouldn't work here as well. However if I naively call NativeThread.signal(th) before calling nd.preClose(fd) in SinkChannelImpl.implCloseSelectableChannel() this improves the situation only slightly, because the VM now hangs in a read(): "TestThread-FileChannel/transferFrom/interrupt" #12 daemon prio=5 os_prio=57 tid=0x00000001199a0800 nid=0x429000f runnable [0x0000000119f02000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.FileDispatcherImpl.read(FileDispatcherImpl.java:46) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SourceChannelImpl.read(SourceChannelImpl.java:167) - locked <0x0a0001003ab1ee68> (a java.lang.Object) at sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:625) at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:663) at AsyncCloseAndInterrupt$19.doIO(AsyncCloseAndInterrupt.java:401) at AsyncCloseAndInterrupt$Tester.go(AsyncCloseAndInterrupt.java:485) at TestThread.run(TestThread.java:55) "main" #1 prio=5 os_prio=57 tid=0x000000011022c800 nid=0x50e0101 runnable [0x000000011021d000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.preClose0(Native Method) at sun.nio.ch.FileDispatcherImpl.preClose(FileDispatcherImpl.java:102) at sun.nio.ch.SourceChannelImpl.implCloseSelectableChannel(SourceChannelImpl.java:88) - locked <0x0a0001003ab1ee78> (a java.lang.Object) at java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:234) at java.nio.channels.spi.AbstractInterruptibleChannel$1.interrupt(AbstractInterruptibleChannel.java:165) - locked <0x0a0001003ab1ee38> (a java.lang.Object) at java.lang.Thread.interrupt(Thread.java:918) - locked <0x0a0001003ab1f5b8> (a java.lang.Object) at AsyncCloseAndInterrupt.test(AsyncCloseAndInterrupt.java:573) at AsyncCloseAndInterrupt.test(AsyncCloseAndInterrupt.java:609) at AsyncCloseAndInterrupt.main(AsyncCloseAndInterrupt.java:681) So I wonder if we would have to wrap all Java-calls to close()/preClose() with NativeThread.signal() and all calls to IO-functions like read/write/fcntl with NativeThreadSet.add()/remove()? Maybe then its easier doing it in the native interface (i.e. the NET_ wrappers) to just mimic the "usual" behaviour on AIX? Regards, Volker On Mon, Jan 20, 2014 at 1:03 PM, Alan Bateman wrote: > > One of the outstanding issues from the OS X port is that the async close of > a FileChannel where there are threads blocked doing I/O operation does not > work, instead close hang (potentially indefinitely, say when another thread > is blocked waiting for a file lock to be acquired or where the file is > something like a pipe or other type of file where you can block > indefinitely). From what I can tell, it wasn't implemented in Apple's JDK6 > either. > > In order to fix this on OS X then close needs to signal all threads that are > blocked in I/O operations, something we already do on Linux. The other part > is removing the preClose (the dup2) from the closing of FileChannels as it > is not needed when you can signal. The webrev with the proposed changes is > here: > > http://cr.openjdk.java.net/~alanb/7133499/webrev/ > > Fixing this issue means that two tests can be removed from the exclude list > (there is third test removed from the exclude list too, that shouldn't have > been there). > > -Alan. From Alan.Bateman at oracle.com Mon Jan 20 13:34:18 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 20 Jan 2014 21:34:18 +0000 Subject: 7133499: (fc) FileChannel.read not preempted by asynchronous close on OS X In-Reply-To: References: <52DD10A1.3040202@oracle.com> Message-ID: <52DD965A.1030505@oracle.com> On 20/01/2014 19:57, Volker Simonis wrote: > Hi Alan, > > I've tried your patch with our port on AIX. > The good news is that it fixes: > > java/nio/channels/AsynchronousFileChannel/Lock.java > > on AIX as well. > > The bad news is, that it doesn't seem to help for: > > java/nio/channels/AsyncCloseAndInterrupt.java > > Here's a stack trace of where the VM gets stuck: In these stack traces then the channels are Pipe.SourceChannel or Pipe.SinkChannel where the file descriptor is to one end of a pipe. Are these the only cases where you see these hangs? I'm interested to know if the async close of SocketChannel and ServerSocketChannel when configured blocked also hangs (I will guess that it will as the behavior is likely to be the same as pipe). I have an idea on how to fix this so that the preClose isn't used when the channel is configured blocking (or isn't registered with a Selector). The patch doesn't use NativeThreadSet because it isn't efficient when the number of threads is limited to 1 or 2 (FileChannel uses NativeThreadSet because it defines positional read/write and so the number of concurrent reader/writers is unlimited). I'll send a patch the other channels soon and we can see if this works for you. If it does work then I have a bit of a preference to being it in via jdk9/dev rather than via the AIX staging forest because the changes impact all platforms. That said, if we being this FileChannel fix for OSX in then it means that the platform specific changes would be in, the patch for the other platforms shouldn't require porting. -Alan. From david.holmes at oracle.com Mon Jan 20 18:49:01 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 21 Jan 2014 12:49:01 +1000 Subject: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE8D974@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE707E1@DEWDFEMB12A.global.corp.sap> <52B3A3AF.9050609@oracle.com> <52D76E6F.8070504@oracle.com> <52D821CB.6020207@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8D010@DEWDFEMB12A.global.corp.sap> <52DC8DA9.2010106@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8D974@DEWDFEMB12A.global.corp.sap> Message-ID: <52DDE01D.1060605@oracle.com> On 20/01/2014 11:41 PM, Lindenmaier, Goetz wrote: > Hi David, > > I understand your arguments and basically agree with them. > If the serialization page does not work on PPC, your solution > 1) is best. > > But I'm not sure why there should be a link between TSO and whether > the serialization page trick works. Second depends, as you say, > on the OS implementation. My limited understanding is that on RMO-based systems the requirements for: "Synchronization based on page-protections - mprotect()" as described in: http://home.comcast.net/~pjbishop/Dave/Asymmetric-Dekker-Synchronization.txt may not always hold. Dave Dice would need to provide more details if needed. Cheers, David > I would assume that the write to the serialization page causes > the OS to generate a new TLB entry or the like, involving the > use of a ptwsync instruction which would be fine. But we are > investigating this. > > We are also experimenting with a small regression test to find > out whether the serialization page ever fails. > > Best regards, > Goetz. > > > > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Montag, 20. Januar 2014 03:45 > To: Lindenmaier, Goetz; Vladimir Kozlov > Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev Source Developers' > Subject: Re: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization > > On 17/01/2014 11:30 PM, Lindenmaier, Goetz wrote: >> Hi, >> >> I had a look at the first part of this issue: Whether StoreStore >> is necessary in the interpreter. Let's for now assume the serialization >> page mechanism works on PPC. >> >> In the state transition leaving the VM state, which is executed in the >> destructor, ThreadStateTransition::transition() is called, which executes >> if (UseMembar) { >> OrderAccess::fence(); >> } else { >> os::write_memory_serialize_page(thread); >> } >> >> os:: write_memory_serialize_page() can not be considered a proper >> MemBar, as it only serializes if another thread poisoned the page. >> Thus it does not qualify to order the initialization and the publishing >> of the object. >> >> You are right, if UseMembar is true, the StoreStore in the interpreter >> is superfluous. We could guard the StoreStores in the interpreter by >> !UseMembar. > > My understanding, from our existing non-TSO system ports, is that the > present assumption is that either: > > a) you have a TSO system, in which case you are probably using the > serialization page, but you don't need any barrier to enforce ordering > anyway; or > > b) you don't have a TSO system, you are using UseMembar==true and so you > get a full fence inserted that enforces the ordering anyway. > > So the ordering requirements are satisfied by piggy-backing on the > UseMembar setting that comes from the thread state transition code, > which forms part of the "runtime entry" code. That's not to say that you > will necessarily find this applied consistently in all places where it > might be applied - nor will you necessarily find that this is common > knowledge amongst VM engineers. > > Technically the storeStore barriers could be conditional on !UseMembar > but that is redundant in the current usage. > >> But then again, one is to order the publishing of the thread states, >> the other to enforce some Java semantics. I don't know whether everybody >> who changes in one place is aware of both issues. But if you want to, >> I'll add a !UseMembar in the interpreter. > > Here are my preferred options in order: > > 1. Set UseMembar==true on PPC64 and drop these new storeStore barriers - > rely on the piggy-backing effect. > > 2. Conditionalize the new storeStore barriers on !UseMembar. This > unfortunately penalizes all platforms with a runtime check. > > 3. Add the storeStores unconditionally. This penalizes platforms that > set UseMembar==true as we will now get two fences at runtime. > > I know we're talking about the interpreter here so performance is not > exactly critical, but still ... > >> Maybe it would be a good idea >> to document the double use in interfaceSupport.cpp, too. And maybe >> add an assertion of some kind. > > interfaceSupport doesn't know that other code piggy-backs on the fact > state-transitions have full fences when UseMembar is true. If it is > documented anywhere it should be in the interpreter (and any other > places that makes the same assumption) - something like: > > // On non-TSO systems there can be additional ordering constraints > // between Java-level actions (such as allocation and constructor > // invocation) that in principle need explicit memory barriers. > // However, on many non-TSO systems the thread-state transition logic > // in the IRT_ENTRY code will insert a full fence due to the use of > // UseMembar==true, which provides the necessary ordering guarantees. > >> >> We're digging into the other issue currenty, whether the serialization page >> works on ppc. We understand your concerns and have no simple answer to >> it right now. At least, in our VM and in the port there are no known problems >> with the state transitions. > > Even if the memory serialization page does not work, in a guaranteed > sense, on PPC-AIX, it is extremely unlikely that testing would expose > this. Also note that the memory serialization page behaviour is more a > function of the OS - so it may be that AIX is different to linux in that > regard. > > Cheers, > David > ----- > >> Best regards, >> Goetz. >> >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Donnerstag, 16. Januar 2014 19:16 >> To: David Holmes; Lindenmaier, Goetz >> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev Source Developers' >> Subject: Re: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization >> >> Changes are in C++ Interpreter so it does not affect Oracle VM. >> But David has point here. I would like to hear the explanation too. >> >> BTW, I see that for ppc64: >> >> src/cpu/ppc/vm//globals_ppc.hpp:define_pd_global(bool, UseMembar, false); >> >> as result write_memory_serialize_page() is used in >> ThreadStateTransition::transition(). >> >> Is it not enough on PPC64? >> >> Thanks, >> Vladimir >> >> On 1/15/14 9:30 PM, David Holmes wrote: >>> Can I get some response on this please - specifically the redundancy wrt >>> IRT_ENTRY actions. >>> >>> Thanks, >>> David >>> >>> On 20/12/2013 11:55 AM, David Holmes wrote: >>>> Still catching up ... >>>> >>>> On 11/12/2013 9:46 PM, Lindenmaier, Goetz wrote: >>>>> Hi, >>>>> >>>>> this change adds StoreStore barriers after object initialization and >>>>> after constructor calls in the C++ interpreter. This assures no >>>>> uninitialized >>>>> objects or final fields are visible. >>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029957-0-moci/ >>>> >>>> The InterpreterRuntime calls are all IRT_ENTRY points which will utilize >>>> thread state transitions that already include a full "fence" so the >>>> storestore barriers are redundant in those cases. >>>> >>>> The fastpath _new storestore seems okay. >>>> >>>> I don't know how handle_return gets used to know if it is reasonable or >>>> not. >>>> >>>> I was trying, unsuccessfully, to examine the same code in the >>>> templateInterpreter to see how it handles these cases as it naturally >>>> has the same object-initialization-safety requirements (though these can >>>> be handled in a number of different ways other than an unconditional >>>> storestore barrier at the end of the initialization and construction >>>> phases. >>>> >>>> David >>>> ----- >>>> >>>>> Please review and test this change. >>>>> >>>>> Best regards, >>>>> Goetz. >>>>> From david.holmes at oracle.com Mon Jan 20 20:54:53 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 21 Jan 2014 14:54:53 +1000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE8CF70@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> <52D78697.2090408@oracle.com> <52D79982.4060100@oracle.com> <52D79E61.1060801@oracle.com> <52D7A0A9.6070208@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8CF70@DEWDFEMB12A.global.corp.sap> Message-ID: <52DDFD9D.3050205@oracle.com> Hi Goetz, On 17/01/2014 6:39 PM, Lindenmaier, Goetz wrote: > Hi, > > I tried to come up with a webrev that implements the change as proposed in > your mails: > http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ > > Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use > support_IRIW_for_not_multiple_copy_atomic_cpu. Given the flag name the commentary eg: + // Support ordering of "Independent Reads of Independent Writes". + if (support_IRIW_for_not_multiple_copy_atomic_cpu) { seems somewhat redundant. > I left the definition and handling of _wrote_volatile in the code, without > any protection. + bool _wrote_volatile; // Did we write a final field? s/final/volatile > I protected issuing the barrier for volatile in constructors with PPC64_ONLY() , > and put it on one line. > > I removed the comment in library_call.cpp. > I also removed the sentence " Solution: implement volatile read as sync-load-acquire." > from the comments as it's PPC specific. I think the primary IRIW comment/explanation should go in globalDefinitions.hpp where support_IRIW_for_not_multiple_copy_atomic_cpu is defined. > Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these > issues in C1 if nobody did it by then. I've filed: https://bugs.openjdk.java.net/browse/JDK-8032366 "Implement C1 support for IRIW conformance on non-multiple-copy-atomic platforms" to cover this task, as it may be needed sooner rather than later. > Wrt. to performance: Oracle will soon do heavy testing of the port. If any > performance problems arise, we still can add #ifdef PPC64 to circumvent this. Ok. Thanks, David > Best regards, > Goetz. > > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Donnerstag, 16. Januar 2014 10:05 > To: Vladimir Kozlov > Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > On 16/01/2014 6:54 PM, Vladimir Kozlov wrote: >> On 1/16/14 12:34 AM, David Holmes wrote: >>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote: >>>> This is becoming ugly #ifdef mess. In compiler code we are trying to >>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I >>>> want to keep it this way, it could be useful to have such info on other >>>> platforms too. But I would suggest to remove PPC64 comments in >>>> parse.hpp. >>>> >>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value >>>> which could be checked in all places instead of #ifdef: >>> >>> I asked for the ifdef some time back as I find it much preferable to >>> have this as a build-time construct rather than a >>> runtime one. I don't want to have to pay anything for this if we don't >>> use it. >> >> Any decent C++ compiler will optimize expressions with such constants >> defined in header files. I insist to avoid #ifdefs in C2 code. I really >> don't like the code with #ifdef in unsafe.cpp but I can live with it. > > If you insist then we may as well do it all the same way. Better to be > consistent. > > My apologies Goetz for wasting your time going back and forth on this. > > That aside I have a further concern with this IRIW support - it is > incomplete as there is no C1 support, as PPC64 isn't using client. If > this is going on then we (which probably means the Oracle 'we') need to > add the missing C1 code. > > David > ----- > >> Vladimir >> >>> >>> David >>> >>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; >>>> #else >>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; >>>> #endif >>>> >>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever >>>> >>>> and then: >>>> >>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ >>>> oop p = JNIHandles::resolve(obj); \ >>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) >>>> OrderAccess::fence(); \ >>>> volatile type_name v = OrderAccess::load_acquire((volatile >>>> type_name*)index_oop_from_field_offset_long(p, offset)); >>>> >>>> And: >>>> >>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu && >>>> field->is_volatile()) { >>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier >>>> + } >>>> >>>> And so on. The comments will be needed only in globalDefinitions.hpp >>>> >>>> The code in parse1.cpp could be put on one line: >>>> >>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && >>>> method()->is_initializer()) )) { >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 1/15/14 9:25 PM, David Holmes wrote: >>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >>>>>> Hi David, >>>>>> >>>>>> I updated the webrev: >>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>> >>>>>> - I removed the IRIW example in parse3.cpp >>>>>> - I adapted the comments not to point to that comment, and to >>>>>> reflect the new flagging. Also I mention that we support the >>>>>> volatile constructor issue, but that it's not standard. >>>>>> - I protected issuing the barrier for the constructor by PPC64. >>>>>> I also think it's better to separate these this way. >>>>> >>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field >>>>> declaration and all uses to be guarded by ifdef PPC64 too >>>>> please. >>>>> >>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this >>>>> comment doesn't make much sense to me and refers to >>>>> ppc specific stuff in a shared file: >>>>> >>>>> if (is_volatile) { >>>>> ! if (!is_store) { >>>>> insert_mem_bar(Op_MemBarAcquire); >>>>> ! } else { >>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. >>>>> insert_mem_bar(Op_MemBarVolatile); >>>>> + #endif >>>>> + } >>>>> >>>>> I don't think the comment is needed. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Thanks for your comments! >>>>>> >>>>>> Best regards, >>>>>> Goetz. >>>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>> Sent: Mittwoch, 15. Januar 2014 01:55 >>>>>> To: Lindenmaier, Goetz >>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; >>>>>> 'hotspot-dev at openjdk.java.net' >>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>> Independent Reads of Independent Writes >>>>>> >>>>>> Hi Goetz, >>>>>> >>>>>> Sorry for the delay in getting back to this. >>>>>> >>>>>> The general changes to the volatile barriers to support IRIW are okay. >>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >>>>>> specifically it is >>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >>>>>> the commentary excessive, particularly for shared code. In particular >>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the >>>>>> explanation and I don't think we need it to that level of detail. >>>>>> Seems >>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >>>>>> >>>>>> The changes related to volatile writes in the constructor, as >>>>>> discussed >>>>>> are not required by the Java Memory Model. If you want to keep these >>>>>> then I think they should all be guarded with PPC64 because it is not >>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >>>>>> PPC64 porters. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I updated this webrev. I detected a small flaw I made when editing >>>>>>> this version. >>>>>>> The #endif in line 322, parse3.cpp was in the wrong line. >>>>>>> I also based the webrev on the latest version of the stage repo. >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz. >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Lindenmaier, Goetz >>>>>>> Sent: Freitag, 20. Dezember 2013 13:47 >>>>>>> To: David Holmes >>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>> Independent Reads of Independent Writes >>>>>>> >>>>>>> Hi David, >>>>>>> >>>>>>>> So we can at least undo #4 now we have established those tests were >>>>>>>> not >>>>>>>> required to pass. >>>>>>> We would prefer if we could keep this in. We want to avoid that it's >>>>>>> blamed on the VM if java programs are failing on PPC after they >>>>>>> worked >>>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard >>>>>>> it by >>>>>>> a flag as proposed. But if you insist I will remove it. Also, this >>>>>>> part is >>>>>>> not that performance relevant. >>>>>>> >>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>> think >>>>>>> I added a compile-time guard in this new webrev: >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>>>>>> several double negations I don't like, (#ifNdef >>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC) >>>>>>> but this way I only have to change the ppc platform. >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz >>>>>>> >>>>>>> P.S.: I will also be available over the Christmas period. >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>> Sent: Freitag, 20. Dezember 2013 05:58 >>>>>>> To: Lindenmaier, Goetz >>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>> Independent Reads of Independent Writes >>>>>>> >>>>>>> Sorry for the delay, it takes a while to catch up after two weeks >>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check >>>>>>> emails. >>>>>>> >>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> ok, I understand the tests are wrong. It's good this issue is >>>>>>>> settled. >>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>>>>>> >>>>>>>> About our change: David, the causality is the other way round. >>>>>>>> The change is about IRIW. >>>>>>>> 1. To pass IRIW, we must use sync instructions before loads. >>>>>>> >>>>>>> This is the part I still have some question marks over as the >>>>>>> implications are not nice for performance on non-TSO platforms. >>>>>>> But I'm >>>>>>> no further along in processing that paper I'm afraid. >>>>>>> >>>>>>>> 2. If we do syncs before loads, we don't need to do them after >>>>>>>> stores. >>>>>>>> 3. If we don't do them after stores, we fail the volatile >>>>>>>> constructor tests. >>>>>>>> 4. So finally we added them again at the end of the constructor >>>>>>>> after stores >>>>>>>> to pass the volatile constructor tests. >>>>>>> >>>>>>> So we can at least undo #4 now we have established those tests >>>>>>> were not >>>>>>> required to pass. >>>>>>> >>>>>>>> We originally passed the constructor tests because the ppc memory >>>>>>>> order >>>>>>>> instructions are not as find-granular as the >>>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad. >>>>>>>> The only instruction >>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, >>>>>>>> therefore the >>>>>>>> MemBarVolatile after the store fixes the constructor tests. The >>>>>>>> proper representation >>>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now >>>>>>>> it's pointless >>>>>>>> anyways. >>>>>>>> >>>>>>>>> I'm not happy with the ifdef approach but I won't block it. >>>>>>>> I'd be happy to add a property >>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>> >>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>> think >>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>>>>>> based not architecture based) as that will allows for turning this >>>>>>> on/off for any architecture for testing purposes. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>>> or the like to guard the customization. I'd like that much better. >>>>>>>> Or also >>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>> >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Goetz. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>> Sent: Donnerstag, 28. November 2013 00:34 >>>>>>>> To: Lindenmaier, Goetz >>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> TL;DR version: >>>>>>>> >>>>>>>> Discussion on the c-i list has now confirmed that a >>>>>>>> constructor-barrier >>>>>>>> for volatiles is not required as part of the JMM specification. It >>>>>>>> *may* >>>>>>>> be required in an implementation that doesn't pre-zero memory to >>>>>>>> ensure >>>>>>>> you can't see uninitialized fields. So the tests for this are >>>>>>>> invalid >>>>>>>> and this part of the patch is not needed in general (ppc64 may >>>>>>>> need it >>>>>>>> due to other factors). >>>>>>>> >>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the >>>>>>>> term :) >>>>>>>> Second thanks for the reference to that paper! For reference: >>>>>>>> >>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>>>>>> complex interconnect) does not guarantee that a write becomes >>>>>>>> visible to >>>>>>>> all other hardware threads at the same time point; these >>>>>>>> architectures >>>>>>>> are not multiple-copy atomic." >>>>>>>> >>>>>>>> This is the visibility issue that I referred to and affects both >>>>>>>> ARM and >>>>>>>> PPC. But of course it is normally handled by using suitable barriers >>>>>>>> after the stores that need to be visible. I think the crux of the >>>>>>>> current issue is what you wrote below: >>>>>>>> >>>>>>>> > The fixes for the constructor issue are only needed because we >>>>>>>> > remove the sync instruction from behind stores >>>>>>>> (parse3.cpp:320) >>>>>>>> > and place it before loads. >>>>>>>> >>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync >>>>>>>> after >>>>>>>> the store then you have to do something around the loads to get the >>>>>>>> same >>>>>>>> results! I still don't know what lead you to the conclusion that the >>>>>>>> only way to fix the IRIW issue was to put the fence before the >>>>>>>> load - >>>>>>>> maybe when I get the chance to read that paper in full it will be >>>>>>>> clearer. >>>>>>>> >>>>>>>> So ... the basic problem is that the current structure in the VM has >>>>>>>> hard-wired one choice of how to get the right semantics for volatile >>>>>>>> variables. You now want to customize that but not all the requisite >>>>>>>> hooks are present. It would be better if volatile_load and >>>>>>>> volatile_store were factored out so that they could be >>>>>>>> implemented as >>>>>>>> desired per-platform. Alternatively there could be pre- and post- >>>>>>>> hooks >>>>>>>> that could then be customized per platform. Otherwise you need >>>>>>>> platform-specific ifdef's to handle it as per your patch. >>>>>>>> >>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think >>>>>>>> this >>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier >>>>>>>> abstractions are a confused mess in my opinion. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>> >>>>>>>>> I did not yet add the >>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>> or >>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>>> to reduce #defined, as I got no further comment on that. >>>>>>>>> >>>>>>>>> >>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>>>>>> I feel not in the position to contribute substantially. >>>>>>>>> >>>>>>>>> But we would like to pass the torture test suite as we consider >>>>>>>>> this a substantial task in implementing a PPC port. Also we think >>>>>>>>> both tests show behavior a programmer would expect. It's bad if >>>>>>>>> Java code runs fine on the more common x86 platform, and then >>>>>>>>> fails on ppc. This will always first be blamed on the VM. >>>>>>>>> >>>>>>>>> The fixes for the constructor issue are only needed because we >>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>>>>>> and place it before loads. Then there is no sync between volatile >>>>>>>>> store >>>>>>>>> and publishing the object. So we add it again in this one case >>>>>>>>> (volatile store in constructor). >>>>>>>>> >>>>>>>>> >>>>>>>>> @David >>>>>>>>>>> Sure. There also is no solution as you require for the >>>>>>>>>>> taskqueue problem yet, >>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>> continuous. >>>>>>>>> That's not true, we did a lot of investigation and testing on this >>>>>>>>> issue. >>>>>>>>> And we came up with a solution we consider the best possible. If >>>>>>>>> you >>>>>>>>> have objections, you should at least give the draft of a better >>>>>>>>> solution, >>>>>>>>> we would volunteer to implement and test it. >>>>>>>>> Similarly, we invested time in fixing the concurrency torture >>>>>>>>> issues. >>>>>>>>> >>>>>>>>> @David >>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term >>>>>>>>>> and >>>>>>>>>> can't find any reference to it. >>>>>>>>> We learned about this reading "A Tutorial Introduction to the >>>>>>>>> ARM and >>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient >>>>>>>>> Work-Stealing for >>>>>>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the >>>>>>>>> taskqueue problem. >>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>>>>> >>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I >>>>>>>>> used 'read' >>>>>>>>> instead. Sorry for that. (I also fixed that in the method name >>>>>>>>> above). >>>>>>>>> >>>>>>>>> Best regards and thanks for all your involvements, >>>>>>>>> Goetz. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> Hi Goetz, >>>>>>>>> >>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>>>>>> Hi David, >>>>>>>>>> >>>>>>>>>> -- Volatile in constuctor >>>>>>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>>>>>> missing constructor barrier. >>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32 >>>>>>>>>> processors >>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks >>>>>>>>>> Aleksey!) >>>>>>>>> >>>>>>>>> And see follow ups - the tests are invalid. >>>>>>>>> >>>>>>>>>> -- IRIW issue >>>>>>>>>>> I can not possibly answer to the necessary level of detail with >>>>>>>>>>> a few >>>>>>>>>>> moments thought. >>>>>>>>>> Sure. There also is no solution as you require for the taskqueue >>>>>>>>>> problem yet, >>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>> >>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>> continuous. >>>>>>>>> >>>>>>>>>>> You are implying there is a problem here that will >>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>> different?) >>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore >>>>>>>>>> I contributed a >>>>>>>>>> solution with the #defines, and that's correct for all, but not >>>>>>>>>> nice, I admit. >>>>>>>>>> (I don't really know about ARM, though). >>>>>>>>>> So if I can write down a nicer solution testing for methods that >>>>>>>>>> are evaluated >>>>>>>>>> by the C-compiler I'm happy. >>>>>>>>>> >>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the >>>>>>>>>> problem >>>>>>>>>> is that >>>>>>>>>> store >>>>>>>>>> sync >>>>>>>>>> does not assure multiple-read-atomicity, >>>>>>>>>> only >>>>>>>>>> sync >>>>>>>>>> load >>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>>>>>> pass that test. >>>>>>>>> >>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the >>>>>>>>> term and >>>>>>>>> can't find any reference to it. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> >>>>>>>>> The JMM is fine. And >>>>>>>>>> store >>>>>>>>>> MemBarVolatile >>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions >>>>>>>>>> that >>>>>>>>>> do what is required. >>>>>>>>>> >>>>>>>>>> So if you are off soon, please let's come to a solution that >>>>>>>>>> might be improvable in the way it's implemented, but that >>>>>>>>>> allows us to implement a correct PPC64 port. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Goetz. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; >>>>>>>>>> 'hotspot-dev at openjdk.java.net'; >>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>> >>>>>>>>>> Hi Goetz, >>>>>>>>>> >>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>> Hi everybody, >>>>>>>>>>> >>>>>>>>>>> thanks a lot for the detailed reviews! >>>>>>>>>>> I'll try to answer to all in one mail. >>>>>>>>>>> >>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM >>>>>>>>>>>> to occur before the reference is assigned; >>>>>>>>>>> We don't think it's correct if we omit the barrier after >>>>>>>>>>> initializing >>>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey >>>>>>>>>>> Shipilev >>>>>>>>>>> and Doug Lea, and they agreed. >>>>>>>>>>> Also, concurrency torture tests >>>>>>>>>>> LongVolatileTest >>>>>>>>>>> AtomicIntegerInitialValueTest >>>>>>>>>>> will fail. >>>>>>>>>>> (In addition, observing 0 instead of the inital value of a >>>>>>>>>>> volatile field would be >>>>>>>>>>> very counter-intuitive for Java programmers, especially in >>>>>>>>>>> AtomicInteger.) >>>>>>>>>> >>>>>>>>>> The affects of unsafe publication are always surprising - >>>>>>>>>> volatiles do >>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM >>>>>>>>>> that >>>>>>>>>> requires the constructor barrier - discussions with Doug and >>>>>>>>>> Aleksey >>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>>>>>> missing constructor barrier. >>>>>>>>>> >>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely >>>>>>>>>>>> heavyweight >>>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We >>>>>>>>>>> don't >>>>>>>>>>> see a way to implement this cheaper. >>>>>>>>>>> >>>>>>>>>>>> - these algorithms should be expressed using the correct >>>>>>>>>>>> OrderAccess operations >>>>>>>>>>> Basically, I agree on this. But you also have to take into >>>>>>>>>>> account >>>>>>>>>>> that due to the different memory ordering instructions on >>>>>>>>>>> different platforms >>>>>>>>>>> just implementing something empty is not sufficient. >>>>>>>>>>> An example: >>>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>>>>>> If these are consecutively in the code, sparc code looks like >>>>>>>>>>> this: >>>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore | >>>>>>>>>>> Assembler::StoreStore) >>>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>>>>>> Just doing what is required. >>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>>>>>> fine grained operations: >>>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore, >>>>>>>>>>> StoreStore, LoadLoad >>>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore, >>>>>>>>>>> StoreStore, LoadLoad, StoreLoad >>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations >>>>>>>>>>> are more (too) powerful, >>>>>>>>>>> I need an additional optimization that removes the lwsync. I >>>>>>>>>>> can not implement >>>>>>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>>>>>> >>>>>>>>>>> Back to the IRIW problem. I think here we have a comparable >>>>>>>>>>> issue. >>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the >>>>>>>>>>> read >>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>>>>>> >>>>>>>>>>> I would propose to guard the code by >>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>>>>>> Else, David, how would you propose to implement this platform >>>>>>>>>>> independent? >>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>>>>>> >>>>>>>>>> I can not possibly answer to the necessary level of detail with a >>>>>>>>>> few >>>>>>>>>> moments thought. You are implying there is a problem here that >>>>>>>>>> will >>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>> different?) and I can not take that on face value at the >>>>>>>>>> moment. The >>>>>>>>>> only reason I can see IRIW not being handled by the JMM >>>>>>>>>> requirements for >>>>>>>>>> volatile accesses is if there are global visibility issues that >>>>>>>>>> are not >>>>>>>>>> addressed - but even then I would expect heavy barriers at the >>>>>>>>>> store >>>>>>>>>> would deal with that, not at the load. (This situation reminds me >>>>>>>>>> of the >>>>>>>>>> need for read-barriers on Alpha architecture due to the use of >>>>>>>>>> software >>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we >>>>>>>>>> don't have >>>>>>>>>> that on ppc!) >>>>>>>>>> >>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days >>>>>>>>>> I will >>>>>>>>>> be heading out on vacation for two weeks. >>>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Goetz. >>>>>>>>>>> >>>>>>>>>>> -- Other ports: >>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so >>>>>>>>>>> it might >>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of >>>>>>>>>>> them). >>>>>>>>>>> >>>>>>>>>>> -- MemBarStoreStore after initialization >>>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I >>>>>>>>>>> can >>>>>>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>>>>>> To: Vladimir Kozlov >>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>> >>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable >>>>>>>>>>> way :) >>>>>>>>>>> >>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>>>>>> I have to ask David to do correctness evaluation. >>>>>>>>>>> >>>>>>>>>>> From what I understand what we see here is an attempt to >>>>>>>>>>> fix an >>>>>>>>>>> existing issue with the implementation of volatiles so that the >>>>>>>>>>> IRIW >>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when >>>>>>>>>>> doing the >>>>>>>>>>> load. >>>>>>>>>>> >>>>>>>>>>> Now if this was purely handled in ppc64 source code then I >>>>>>>>>>> would be >>>>>>>>>>> happy to let them do whatever they like (surely this kills >>>>>>>>>>> performance >>>>>>>>>>> though!). But I do not agree with the changes to the shared code >>>>>>>>>>> that >>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY >>>>>>>>>>> this is >>>>>>>>>>> polluting the shared code. My concern is similar to what I said >>>>>>>>>>> with the >>>>>>>>>>> taskQueue changes - these algorithms should be expressed using >>>>>>>>>>> the >>>>>>>>>>> correct OrderAccess operations to guarantee the desired >>>>>>>>>>> properties >>>>>>>>>>> independent of architecture. If such a "barrier" is not needed >>>>>>>>>>> on a >>>>>>>>>>> given architecture then the implementation in OrderAccess should >>>>>>>>>>> reduce >>>>>>>>>>> to a no-op. >>>>>>>>>>> >>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed >>>>>>>>>>> under >>>>>>>>>>> the JMM. >>>>>>>>>>> >>>>>>>>>>>> I am fine with suggested changes because you did not change our >>>>>>>>>>>> current >>>>>>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>>>>>> But may be it should be done using more general query which >>>>>>>>>>>> is set >>>>>>>>>>>> depending on platform: >>>>>>>>>>>> >>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>>> >>>>>>>>>>>> or similar to what we use now: >>>>>>>>>>>> >>>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>> >>>>>>>>>>> Every platform has to support IRIW this is simply part of the >>>>>>>>>>> Java >>>>>>>>>>> Memory Model, there should not be any need to call this out >>>>>>>>>>> explicitly >>>>>>>>>>> like this. >>>>>>>>>>> >>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are >>>>>>>>>>> there >>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM >>>>>>>>>>> defines? >>>>>>>>>>>> From what I understand our ppc port is also affected. >>>>>>>>>>>> David? >>>>>>>>>>> >>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be >>>>>>>>>>>> inside else {}. >>>>>>>>>>>> >>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64 >>>>>>>>>>>> specific which >>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need >>>>>>>>>>>> PPC64_ONLY() >>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many >>>>>>>>>>>> #ifdefs. >>>>>>>>>>>> >>>>>>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>>>>>> >>>>>>>>>>>> if (is_vol) { >>>>>>>>>>>> // See comment in do_get_xxx(). >>>>>>>>>>>> #ifndef PPC64 >>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>>>>>> #else >>>>>>>>>>>> if (is_field) { >>>>>>>>>>>> // Add MemBarRelease for constructors which write >>>>>>>>>>>> volatile field >>>>>>>>>>>> (PPC64). >>>>>>>>>>>> set_wrote_volatile(true); >>>>>>>>>>>> } >>>>>>>>>>>> #endif >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Vladimir >>>>>>>>>>>> >>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the >>>>>>>>>>>>> VolatileIRIWTest of >>>>>>>>>>>>> the torture test suite: >>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>>>> >>>>>>>>>>>>> Example: >>>>>>>>>>>>> volatile x=0, y=0 >>>>>>>>>>>>> __________ __________ __________ __________ >>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>>>>>> >>>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>>>>>> read(y) read(x) >>>>>>>>>>>>> >>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This >>>>>>>>>>>>> is only >>>>>>>>>>>>> assured by the sync instruction and if it is executed in the >>>>>>>>>>>>> threads >>>>>>>>>>>>> doing the loads. Thus we implement volatile read as >>>>>>>>>>>>> sync-load-acquire >>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>>>>>>> >>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple >>>>>>>>>>>>> SIGSEGVs >>>>>>>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>>>>>>> >>>>>>>>>>>>> Further this change contains a fix that assures that volatile >>>>>>>>>>>>> fields >>>>>>>>>>>>> written in constructors are visible before the reference gets >>>>>>>>>>>>> published. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us, >>>>>>>>>>>>> seems too >>>>>>>>>>>>> strong. >>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should >>>>>>>>>>>>> suffice. >>>>>>>>>>>>> What do you think? >>>>>>>>>>>>> >>>>>>>>>>>>> Please review and test this change. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards, >>>>>>>>>>>>> Goetz. >>>>>>>>>>>>> From goetz.lindenmaier at sap.com Tue Jan 21 01:22:08 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 21 Jan 2014 09:22:08 +0000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <52DDFD9D.3050205@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5293FE15.9050100@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C4C5@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> <52D78697.2090408@oracle.com> <52D79982.4060100@oracle.com> <52D79E61.1060801@oracle.com> <52D7A0A9.6070208@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8CF70@DEWDFEMB12A.global.corp.sap> <52DDFD9D.3050205@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8EBA7@DEWDFEMB12A.global.corp.sap> Hi, I made a new webrev http://cr.openjdk.java.net/~goetz/webrevs/8029101-3-raw/ differing from http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ only in the comments. I removed // Support ordering of "Independent Reads of Independent Writes". everywhere, and edited the comments in the globalDefinition*.hpp files. Best regards, Goetz. -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Dienstag, 21. Januar 2014 05:55 To: Lindenmaier, Goetz; Vladimir Kozlov Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes Hi Goetz, On 17/01/2014 6:39 PM, Lindenmaier, Goetz wrote: > Hi, > > I tried to come up with a webrev that implements the change as proposed in > your mails: > http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ > > Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use > support_IRIW_for_not_multiple_copy_atomic_cpu. Given the flag name the commentary eg: + // Support ordering of "Independent Reads of Independent Writes". + if (support_IRIW_for_not_multiple_copy_atomic_cpu) { seems somewhat redundant. > I left the definition and handling of _wrote_volatile in the code, without > any protection. + bool _wrote_volatile; // Did we write a final field? s/final/volatile > I protected issuing the barrier for volatile in constructors with PPC64_ONLY() , > and put it on one line. > > I removed the comment in library_call.cpp. > I also removed the sentence " Solution: implement volatile read as sync-load-acquire." > from the comments as it's PPC specific. I think the primary IRIW comment/explanation should go in globalDefinitions.hpp where support_IRIW_for_not_multiple_copy_atomic_cpu is defined. > Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these > issues in C1 if nobody did it by then. I've filed: https://bugs.openjdk.java.net/browse/JDK-8032366 "Implement C1 support for IRIW conformance on non-multiple-copy-atomic platforms" to cover this task, as it may be needed sooner rather than later. > Wrt. to performance: Oracle will soon do heavy testing of the port. If any > performance problems arise, we still can add #ifdef PPC64 to circumvent this. Ok. Thanks, David > Best regards, > Goetz. > > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Donnerstag, 16. Januar 2014 10:05 > To: Vladimir Kozlov > Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > On 16/01/2014 6:54 PM, Vladimir Kozlov wrote: >> On 1/16/14 12:34 AM, David Holmes wrote: >>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote: >>>> This is becoming ugly #ifdef mess. In compiler code we are trying to >>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I >>>> want to keep it this way, it could be useful to have such info on other >>>> platforms too. But I would suggest to remove PPC64 comments in >>>> parse.hpp. >>>> >>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value >>>> which could be checked in all places instead of #ifdef: >>> >>> I asked for the ifdef some time back as I find it much preferable to >>> have this as a build-time construct rather than a >>> runtime one. I don't want to have to pay anything for this if we don't >>> use it. >> >> Any decent C++ compiler will optimize expressions with such constants >> defined in header files. I insist to avoid #ifdefs in C2 code. I really >> don't like the code with #ifdef in unsafe.cpp but I can live with it. > > If you insist then we may as well do it all the same way. Better to be > consistent. > > My apologies Goetz for wasting your time going back and forth on this. > > That aside I have a further concern with this IRIW support - it is > incomplete as there is no C1 support, as PPC64 isn't using client. If > this is going on then we (which probably means the Oracle 'we') need to > add the missing C1 code. > > David > ----- > >> Vladimir >> >>> >>> David >>> >>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; >>>> #else >>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; >>>> #endif >>>> >>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever >>>> >>>> and then: >>>> >>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ >>>> oop p = JNIHandles::resolve(obj); \ >>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) >>>> OrderAccess::fence(); \ >>>> volatile type_name v = OrderAccess::load_acquire((volatile >>>> type_name*)index_oop_from_field_offset_long(p, offset)); >>>> >>>> And: >>>> >>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu && >>>> field->is_volatile()) { >>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier >>>> + } >>>> >>>> And so on. The comments will be needed only in globalDefinitions.hpp >>>> >>>> The code in parse1.cpp could be put on one line: >>>> >>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && >>>> method()->is_initializer()) )) { >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 1/15/14 9:25 PM, David Holmes wrote: >>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >>>>>> Hi David, >>>>>> >>>>>> I updated the webrev: >>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>> >>>>>> - I removed the IRIW example in parse3.cpp >>>>>> - I adapted the comments not to point to that comment, and to >>>>>> reflect the new flagging. Also I mention that we support the >>>>>> volatile constructor issue, but that it's not standard. >>>>>> - I protected issuing the barrier for the constructor by PPC64. >>>>>> I also think it's better to separate these this way. >>>>> >>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field >>>>> declaration and all uses to be guarded by ifdef PPC64 too >>>>> please. >>>>> >>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this >>>>> comment doesn't make much sense to me and refers to >>>>> ppc specific stuff in a shared file: >>>>> >>>>> if (is_volatile) { >>>>> ! if (!is_store) { >>>>> insert_mem_bar(Op_MemBarAcquire); >>>>> ! } else { >>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. >>>>> insert_mem_bar(Op_MemBarVolatile); >>>>> + #endif >>>>> + } >>>>> >>>>> I don't think the comment is needed. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Thanks for your comments! >>>>>> >>>>>> Best regards, >>>>>> Goetz. >>>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>> Sent: Mittwoch, 15. Januar 2014 01:55 >>>>>> To: Lindenmaier, Goetz >>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; >>>>>> 'hotspot-dev at openjdk.java.net' >>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>> Independent Reads of Independent Writes >>>>>> >>>>>> Hi Goetz, >>>>>> >>>>>> Sorry for the delay in getting back to this. >>>>>> >>>>>> The general changes to the volatile barriers to support IRIW are okay. >>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >>>>>> specifically it is >>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >>>>>> the commentary excessive, particularly for shared code. In particular >>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the >>>>>> explanation and I don't think we need it to that level of detail. >>>>>> Seems >>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >>>>>> >>>>>> The changes related to volatile writes in the constructor, as >>>>>> discussed >>>>>> are not required by the Java Memory Model. If you want to keep these >>>>>> then I think they should all be guarded with PPC64 because it is not >>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >>>>>> PPC64 porters. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I updated this webrev. I detected a small flaw I made when editing >>>>>>> this version. >>>>>>> The #endif in line 322, parse3.cpp was in the wrong line. >>>>>>> I also based the webrev on the latest version of the stage repo. >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz. >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Lindenmaier, Goetz >>>>>>> Sent: Freitag, 20. Dezember 2013 13:47 >>>>>>> To: David Holmes >>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>> Independent Reads of Independent Writes >>>>>>> >>>>>>> Hi David, >>>>>>> >>>>>>>> So we can at least undo #4 now we have established those tests were >>>>>>>> not >>>>>>>> required to pass. >>>>>>> We would prefer if we could keep this in. We want to avoid that it's >>>>>>> blamed on the VM if java programs are failing on PPC after they >>>>>>> worked >>>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard >>>>>>> it by >>>>>>> a flag as proposed. But if you insist I will remove it. Also, this >>>>>>> part is >>>>>>> not that performance relevant. >>>>>>> >>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>> think >>>>>>> I added a compile-time guard in this new webrev: >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>>>>>> several double negations I don't like, (#ifNdef >>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC) >>>>>>> but this way I only have to change the ppc platform. >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz >>>>>>> >>>>>>> P.S.: I will also be available over the Christmas period. >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>> Sent: Freitag, 20. Dezember 2013 05:58 >>>>>>> To: Lindenmaier, Goetz >>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>> Independent Reads of Independent Writes >>>>>>> >>>>>>> Sorry for the delay, it takes a while to catch up after two weeks >>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check >>>>>>> emails. >>>>>>> >>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> ok, I understand the tests are wrong. It's good this issue is >>>>>>>> settled. >>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>>>>>> >>>>>>>> About our change: David, the causality is the other way round. >>>>>>>> The change is about IRIW. >>>>>>>> 1. To pass IRIW, we must use sync instructions before loads. >>>>>>> >>>>>>> This is the part I still have some question marks over as the >>>>>>> implications are not nice for performance on non-TSO platforms. >>>>>>> But I'm >>>>>>> no further along in processing that paper I'm afraid. >>>>>>> >>>>>>>> 2. If we do syncs before loads, we don't need to do them after >>>>>>>> stores. >>>>>>>> 3. If we don't do them after stores, we fail the volatile >>>>>>>> constructor tests. >>>>>>>> 4. So finally we added them again at the end of the constructor >>>>>>>> after stores >>>>>>>> to pass the volatile constructor tests. >>>>>>> >>>>>>> So we can at least undo #4 now we have established those tests >>>>>>> were not >>>>>>> required to pass. >>>>>>> >>>>>>>> We originally passed the constructor tests because the ppc memory >>>>>>>> order >>>>>>>> instructions are not as find-granular as the >>>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad. >>>>>>>> The only instruction >>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, >>>>>>>> therefore the >>>>>>>> MemBarVolatile after the store fixes the constructor tests. The >>>>>>>> proper representation >>>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now >>>>>>>> it's pointless >>>>>>>> anyways. >>>>>>>> >>>>>>>>> I'm not happy with the ifdef approach but I won't block it. >>>>>>>> I'd be happy to add a property >>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>> >>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>> think >>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>>>>>> based not architecture based) as that will allows for turning this >>>>>>> on/off for any architecture for testing purposes. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>>> or the like to guard the customization. I'd like that much better. >>>>>>>> Or also >>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>> >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Goetz. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>> Sent: Donnerstag, 28. November 2013 00:34 >>>>>>>> To: Lindenmaier, Goetz >>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> TL;DR version: >>>>>>>> >>>>>>>> Discussion on the c-i list has now confirmed that a >>>>>>>> constructor-barrier >>>>>>>> for volatiles is not required as part of the JMM specification. It >>>>>>>> *may* >>>>>>>> be required in an implementation that doesn't pre-zero memory to >>>>>>>> ensure >>>>>>>> you can't see uninitialized fields. So the tests for this are >>>>>>>> invalid >>>>>>>> and this part of the patch is not needed in general (ppc64 may >>>>>>>> need it >>>>>>>> due to other factors). >>>>>>>> >>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the >>>>>>>> term :) >>>>>>>> Second thanks for the reference to that paper! For reference: >>>>>>>> >>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>>>>>> complex interconnect) does not guarantee that a write becomes >>>>>>>> visible to >>>>>>>> all other hardware threads at the same time point; these >>>>>>>> architectures >>>>>>>> are not multiple-copy atomic." >>>>>>>> >>>>>>>> This is the visibility issue that I referred to and affects both >>>>>>>> ARM and >>>>>>>> PPC. But of course it is normally handled by using suitable barriers >>>>>>>> after the stores that need to be visible. I think the crux of the >>>>>>>> current issue is what you wrote below: >>>>>>>> >>>>>>>> > The fixes for the constructor issue are only needed because we >>>>>>>> > remove the sync instruction from behind stores >>>>>>>> (parse3.cpp:320) >>>>>>>> > and place it before loads. >>>>>>>> >>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync >>>>>>>> after >>>>>>>> the store then you have to do something around the loads to get the >>>>>>>> same >>>>>>>> results! I still don't know what lead you to the conclusion that the >>>>>>>> only way to fix the IRIW issue was to put the fence before the >>>>>>>> load - >>>>>>>> maybe when I get the chance to read that paper in full it will be >>>>>>>> clearer. >>>>>>>> >>>>>>>> So ... the basic problem is that the current structure in the VM has >>>>>>>> hard-wired one choice of how to get the right semantics for volatile >>>>>>>> variables. You now want to customize that but not all the requisite >>>>>>>> hooks are present. It would be better if volatile_load and >>>>>>>> volatile_store were factored out so that they could be >>>>>>>> implemented as >>>>>>>> desired per-platform. Alternatively there could be pre- and post- >>>>>>>> hooks >>>>>>>> that could then be customized per platform. Otherwise you need >>>>>>>> platform-specific ifdef's to handle it as per your patch. >>>>>>>> >>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think >>>>>>>> this >>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier >>>>>>>> abstractions are a confused mess in my opinion. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>> >>>>>>>>> I did not yet add the >>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>> or >>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>>> to reduce #defined, as I got no further comment on that. >>>>>>>>> >>>>>>>>> >>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>>>>>> I feel not in the position to contribute substantially. >>>>>>>>> >>>>>>>>> But we would like to pass the torture test suite as we consider >>>>>>>>> this a substantial task in implementing a PPC port. Also we think >>>>>>>>> both tests show behavior a programmer would expect. It's bad if >>>>>>>>> Java code runs fine on the more common x86 platform, and then >>>>>>>>> fails on ppc. This will always first be blamed on the VM. >>>>>>>>> >>>>>>>>> The fixes for the constructor issue are only needed because we >>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>>>>>> and place it before loads. Then there is no sync between volatile >>>>>>>>> store >>>>>>>>> and publishing the object. So we add it again in this one case >>>>>>>>> (volatile store in constructor). >>>>>>>>> >>>>>>>>> >>>>>>>>> @David >>>>>>>>>>> Sure. There also is no solution as you require for the >>>>>>>>>>> taskqueue problem yet, >>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>> continuous. >>>>>>>>> That's not true, we did a lot of investigation and testing on this >>>>>>>>> issue. >>>>>>>>> And we came up with a solution we consider the best possible. If >>>>>>>>> you >>>>>>>>> have objections, you should at least give the draft of a better >>>>>>>>> solution, >>>>>>>>> we would volunteer to implement and test it. >>>>>>>>> Similarly, we invested time in fixing the concurrency torture >>>>>>>>> issues. >>>>>>>>> >>>>>>>>> @David >>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term >>>>>>>>>> and >>>>>>>>>> can't find any reference to it. >>>>>>>>> We learned about this reading "A Tutorial Introduction to the >>>>>>>>> ARM and >>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient >>>>>>>>> Work-Stealing for >>>>>>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the >>>>>>>>> taskqueue problem. >>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>>>>> >>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I >>>>>>>>> used 'read' >>>>>>>>> instead. Sorry for that. (I also fixed that in the method name >>>>>>>>> above). >>>>>>>>> >>>>>>>>> Best regards and thanks for all your involvements, >>>>>>>>> Goetz. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> Hi Goetz, >>>>>>>>> >>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>>>>>> Hi David, >>>>>>>>>> >>>>>>>>>> -- Volatile in constuctor >>>>>>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>>>>>> missing constructor barrier. >>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32 >>>>>>>>>> processors >>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks >>>>>>>>>> Aleksey!) >>>>>>>>> >>>>>>>>> And see follow ups - the tests are invalid. >>>>>>>>> >>>>>>>>>> -- IRIW issue >>>>>>>>>>> I can not possibly answer to the necessary level of detail with >>>>>>>>>>> a few >>>>>>>>>>> moments thought. >>>>>>>>>> Sure. There also is no solution as you require for the taskqueue >>>>>>>>>> problem yet, >>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>> >>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>> continuous. >>>>>>>>> >>>>>>>>>>> You are implying there is a problem here that will >>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>> different?) >>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore >>>>>>>>>> I contributed a >>>>>>>>>> solution with the #defines, and that's correct for all, but not >>>>>>>>>> nice, I admit. >>>>>>>>>> (I don't really know about ARM, though). >>>>>>>>>> So if I can write down a nicer solution testing for methods that >>>>>>>>>> are evaluated >>>>>>>>>> by the C-compiler I'm happy. >>>>>>>>>> >>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the >>>>>>>>>> problem >>>>>>>>>> is that >>>>>>>>>> store >>>>>>>>>> sync >>>>>>>>>> does not assure multiple-read-atomicity, >>>>>>>>>> only >>>>>>>>>> sync >>>>>>>>>> load >>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>>>>>> pass that test. >>>>>>>>> >>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the >>>>>>>>> term and >>>>>>>>> can't find any reference to it. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> >>>>>>>>> The JMM is fine. And >>>>>>>>>> store >>>>>>>>>> MemBarVolatile >>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions >>>>>>>>>> that >>>>>>>>>> do what is required. >>>>>>>>>> >>>>>>>>>> So if you are off soon, please let's come to a solution that >>>>>>>>>> might be improvable in the way it's implemented, but that >>>>>>>>>> allows us to implement a correct PPC64 port. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Goetz. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; >>>>>>>>>> 'hotspot-dev at openjdk.java.net'; >>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>> >>>>>>>>>> Hi Goetz, >>>>>>>>>> >>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>> Hi everybody, >>>>>>>>>>> >>>>>>>>>>> thanks a lot for the detailed reviews! >>>>>>>>>>> I'll try to answer to all in one mail. >>>>>>>>>>> >>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM >>>>>>>>>>>> to occur before the reference is assigned; >>>>>>>>>>> We don't think it's correct if we omit the barrier after >>>>>>>>>>> initializing >>>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey >>>>>>>>>>> Shipilev >>>>>>>>>>> and Doug Lea, and they agreed. >>>>>>>>>>> Also, concurrency torture tests >>>>>>>>>>> LongVolatileTest >>>>>>>>>>> AtomicIntegerInitialValueTest >>>>>>>>>>> will fail. >>>>>>>>>>> (In addition, observing 0 instead of the inital value of a >>>>>>>>>>> volatile field would be >>>>>>>>>>> very counter-intuitive for Java programmers, especially in >>>>>>>>>>> AtomicInteger.) >>>>>>>>>> >>>>>>>>>> The affects of unsafe publication are always surprising - >>>>>>>>>> volatiles do >>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM >>>>>>>>>> that >>>>>>>>>> requires the constructor barrier - discussions with Doug and >>>>>>>>>> Aleksey >>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>>>>>> missing constructor barrier. >>>>>>>>>> >>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely >>>>>>>>>>>> heavyweight >>>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We >>>>>>>>>>> don't >>>>>>>>>>> see a way to implement this cheaper. >>>>>>>>>>> >>>>>>>>>>>> - these algorithms should be expressed using the correct >>>>>>>>>>>> OrderAccess operations >>>>>>>>>>> Basically, I agree on this. But you also have to take into >>>>>>>>>>> account >>>>>>>>>>> that due to the different memory ordering instructions on >>>>>>>>>>> different platforms >>>>>>>>>>> just implementing something empty is not sufficient. >>>>>>>>>>> An example: >>>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>>>>>> If these are consecutively in the code, sparc code looks like >>>>>>>>>>> this: >>>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore | >>>>>>>>>>> Assembler::StoreStore) >>>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>>>>>> Just doing what is required. >>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>>>>>> fine grained operations: >>>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore, >>>>>>>>>>> StoreStore, LoadLoad >>>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore, >>>>>>>>>>> StoreStore, LoadLoad, StoreLoad >>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations >>>>>>>>>>> are more (too) powerful, >>>>>>>>>>> I need an additional optimization that removes the lwsync. I >>>>>>>>>>> can not implement >>>>>>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>>>>>> >>>>>>>>>>> Back to the IRIW problem. I think here we have a comparable >>>>>>>>>>> issue. >>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the >>>>>>>>>>> read >>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>>>>>> >>>>>>>>>>> I would propose to guard the code by >>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>>>>>> Else, David, how would you propose to implement this platform >>>>>>>>>>> independent? >>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>>>>>> >>>>>>>>>> I can not possibly answer to the necessary level of detail with a >>>>>>>>>> few >>>>>>>>>> moments thought. You are implying there is a problem here that >>>>>>>>>> will >>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>> different?) and I can not take that on face value at the >>>>>>>>>> moment. The >>>>>>>>>> only reason I can see IRIW not being handled by the JMM >>>>>>>>>> requirements for >>>>>>>>>> volatile accesses is if there are global visibility issues that >>>>>>>>>> are not >>>>>>>>>> addressed - but even then I would expect heavy barriers at the >>>>>>>>>> store >>>>>>>>>> would deal with that, not at the load. (This situation reminds me >>>>>>>>>> of the >>>>>>>>>> need for read-barriers on Alpha architecture due to the use of >>>>>>>>>> software >>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we >>>>>>>>>> don't have >>>>>>>>>> that on ppc!) >>>>>>>>>> >>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days >>>>>>>>>> I will >>>>>>>>>> be heading out on vacation for two weeks. >>>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Goetz. >>>>>>>>>>> >>>>>>>>>>> -- Other ports: >>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so >>>>>>>>>>> it might >>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of >>>>>>>>>>> them). >>>>>>>>>>> >>>>>>>>>>> -- MemBarStoreStore after initialization >>>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I >>>>>>>>>>> can >>>>>>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>>>>>> To: Vladimir Kozlov >>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>> >>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable >>>>>>>>>>> way :) >>>>>>>>>>> >>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>>>>>> I have to ask David to do correctness evaluation. >>>>>>>>>>> >>>>>>>>>>> From what I understand what we see here is an attempt to >>>>>>>>>>> fix an >>>>>>>>>>> existing issue with the implementation of volatiles so that the >>>>>>>>>>> IRIW >>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when >>>>>>>>>>> doing the >>>>>>>>>>> load. >>>>>>>>>>> >>>>>>>>>>> Now if this was purely handled in ppc64 source code then I >>>>>>>>>>> would be >>>>>>>>>>> happy to let them do whatever they like (surely this kills >>>>>>>>>>> performance >>>>>>>>>>> though!). But I do not agree with the changes to the shared code >>>>>>>>>>> that >>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY >>>>>>>>>>> this is >>>>>>>>>>> polluting the shared code. My concern is similar to what I said >>>>>>>>>>> with the >>>>>>>>>>> taskQueue changes - these algorithms should be expressed using >>>>>>>>>>> the >>>>>>>>>>> correct OrderAccess operations to guarantee the desired >>>>>>>>>>> properties >>>>>>>>>>> independent of architecture. If such a "barrier" is not needed >>>>>>>>>>> on a >>>>>>>>>>> given architecture then the implementation in OrderAccess should >>>>>>>>>>> reduce >>>>>>>>>>> to a no-op. >>>>>>>>>>> >>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed >>>>>>>>>>> under >>>>>>>>>>> the JMM. >>>>>>>>>>> >>>>>>>>>>>> I am fine with suggested changes because you did not change our >>>>>>>>>>>> current >>>>>>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>>>>>> But may be it should be done using more general query which >>>>>>>>>>>> is set >>>>>>>>>>>> depending on platform: >>>>>>>>>>>> >>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>>> >>>>>>>>>>>> or similar to what we use now: >>>>>>>>>>>> >>>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>> >>>>>>>>>>> Every platform has to support IRIW this is simply part of the >>>>>>>>>>> Java >>>>>>>>>>> Memory Model, there should not be any need to call this out >>>>>>>>>>> explicitly >>>>>>>>>>> like this. >>>>>>>>>>> >>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are >>>>>>>>>>> there >>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM >>>>>>>>>>> defines? >>>>>>>>>>>> From what I understand our ppc port is also affected. >>>>>>>>>>>> David? >>>>>>>>>>> >>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be >>>>>>>>>>>> inside else {}. >>>>>>>>>>>> >>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64 >>>>>>>>>>>> specific which >>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need >>>>>>>>>>>> PPC64_ONLY() >>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many >>>>>>>>>>>> #ifdefs. >>>>>>>>>>>> >>>>>>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>>>>>> >>>>>>>>>>>> if (is_vol) { >>>>>>>>>>>> // See comment in do_get_xxx(). >>>>>>>>>>>> #ifndef PPC64 >>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>>>>>> #else >>>>>>>>>>>> if (is_field) { >>>>>>>>>>>> // Add MemBarRelease for constructors which write >>>>>>>>>>>> volatile field >>>>>>>>>>>> (PPC64). >>>>>>>>>>>> set_wrote_volatile(true); >>>>>>>>>>>> } >>>>>>>>>>>> #endif >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Vladimir >>>>>>>>>>>> >>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the >>>>>>>>>>>>> VolatileIRIWTest of >>>>>>>>>>>>> the torture test suite: >>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>>>> >>>>>>>>>>>>> Example: >>>>>>>>>>>>> volatile x=0, y=0 >>>>>>>>>>>>> __________ __________ __________ __________ >>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>>>>>> >>>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>>>>>> read(y) read(x) >>>>>>>>>>>>> >>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This >>>>>>>>>>>>> is only >>>>>>>>>>>>> assured by the sync instruction and if it is executed in the >>>>>>>>>>>>> threads >>>>>>>>>>>>> doing the loads. Thus we implement volatile read as >>>>>>>>>>>>> sync-load-acquire >>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>>>>>>> >>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple >>>>>>>>>>>>> SIGSEGVs >>>>>>>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>>>>>>> >>>>>>>>>>>>> Further this change contains a fix that assures that volatile >>>>>>>>>>>>> fields >>>>>>>>>>>>> written in constructors are visible before the reference gets >>>>>>>>>>>>> published. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us, >>>>>>>>>>>>> seems too >>>>>>>>>>>>> strong. >>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should >>>>>>>>>>>>> suffice. >>>>>>>>>>>>> What do you think? >>>>>>>>>>>>> >>>>>>>>>>>>> Please review and test this change. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards, >>>>>>>>>>>>> Goetz. >>>>>>>>>>>>> From david.holmes at oracle.com Tue Jan 21 03:53:20 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 21 Jan 2014 21:53:20 +1000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE8EBA7@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> <52D78697.2090408@oracle.com> <52D79982.4060100@oracle.com> <52D79E61.1060801@oracle.com> <52D7A0A9.6070208@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8CF70@DEWDFEMB12A.global.corp.sap> <52DDFD9D.3050205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8EBA7@DEWDFEMB12A.global.corp.sap> Message-ID: <52DE5FB0.5000808@oracle.com> Thanks Goetz! This typo still exists: + bool _wrote_volatile; // Did we write a final field? s/final/volatile/ Otherwise no further comments from me. David On 21/01/2014 7:22 PM, Lindenmaier, Goetz wrote: > Hi, > > I made a new webrev > http://cr.openjdk.java.net/~goetz/webrevs/8029101-3-raw/ > differing from > http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ > only in the comments. > > I removed > // Support ordering of "Independent Reads of Independent Writes". > everywhere, and edited the comments in the globalDefinition*.hpp > files. > > Best regards, > Goetz. > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Dienstag, 21. Januar 2014 05:55 > To: Lindenmaier, Goetz; Vladimir Kozlov > Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > Hi Goetz, > > On 17/01/2014 6:39 PM, Lindenmaier, Goetz wrote: >> Hi, >> >> I tried to come up with a webrev that implements the change as proposed in >> your mails: >> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ >> >> Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use >> support_IRIW_for_not_multiple_copy_atomic_cpu. > > Given the flag name the commentary eg: > > + // Support ordering of "Independent Reads of Independent Writes". > + if (support_IRIW_for_not_multiple_copy_atomic_cpu) { > > seems somewhat redundant. > >> I left the definition and handling of _wrote_volatile in the code, without >> any protection. > > + bool _wrote_volatile; // Did we write a final field? > > s/final/volatile > >> I protected issuing the barrier for volatile in constructors with PPC64_ONLY() , >> and put it on one line. >> >> I removed the comment in library_call.cpp. >> I also removed the sentence " Solution: implement volatile read as sync-load-acquire." >> from the comments as it's PPC specific. > > I think the primary IRIW comment/explanation should go in > globalDefinitions.hpp where > support_IRIW_for_not_multiple_copy_atomic_cpu is defined. > >> Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these >> issues in C1 if nobody did it by then. > > I've filed: > > https://bugs.openjdk.java.net/browse/JDK-8032366 > > "Implement C1 support for IRIW conformance on non-multiple-copy-atomic > platforms" > > to cover this task, as it may be needed sooner rather than later. > >> Wrt. to performance: Oracle will soon do heavy testing of the port. If any >> performance problems arise, we still can add #ifdef PPC64 to circumvent this. > > Ok. > > Thanks, > David > >> Best regards, >> Goetz. >> >> >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Donnerstag, 16. Januar 2014 10:05 >> To: Vladimir Kozlov >> Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> On 16/01/2014 6:54 PM, Vladimir Kozlov wrote: >>> On 1/16/14 12:34 AM, David Holmes wrote: >>>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote: >>>>> This is becoming ugly #ifdef mess. In compiler code we are trying to >>>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I >>>>> want to keep it this way, it could be useful to have such info on other >>>>> platforms too. But I would suggest to remove PPC64 comments in >>>>> parse.hpp. >>>>> >>>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value >>>>> which could be checked in all places instead of #ifdef: >>>> >>>> I asked for the ifdef some time back as I find it much preferable to >>>> have this as a build-time construct rather than a >>>> runtime one. I don't want to have to pay anything for this if we don't >>>> use it. >>> >>> Any decent C++ compiler will optimize expressions with such constants >>> defined in header files. I insist to avoid #ifdefs in C2 code. I really >>> don't like the code with #ifdef in unsafe.cpp but I can live with it. >> >> If you insist then we may as well do it all the same way. Better to be >> consistent. >> >> My apologies Goetz for wasting your time going back and forth on this. >> >> That aside I have a further concern with this IRIW support - it is >> incomplete as there is no C1 support, as PPC64 isn't using client. If >> this is going on then we (which probably means the Oracle 'we') need to >> add the missing C1 code. >> >> David >> ----- >> >>> Vladimir >>> >>>> >>>> David >>>> >>>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; >>>>> #else >>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; >>>>> #endif >>>>> >>>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever >>>>> >>>>> and then: >>>>> >>>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ >>>>> oop p = JNIHandles::resolve(obj); \ >>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) >>>>> OrderAccess::fence(); \ >>>>> volatile type_name v = OrderAccess::load_acquire((volatile >>>>> type_name*)index_oop_from_field_offset_long(p, offset)); >>>>> >>>>> And: >>>>> >>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu && >>>>> field->is_volatile()) { >>>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier >>>>> + } >>>>> >>>>> And so on. The comments will be needed only in globalDefinitions.hpp >>>>> >>>>> The code in parse1.cpp could be put on one line: >>>>> >>>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && >>>>> method()->is_initializer()) )) { >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 1/15/14 9:25 PM, David Holmes wrote: >>>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >>>>>>> Hi David, >>>>>>> >>>>>>> I updated the webrev: >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>> >>>>>>> - I removed the IRIW example in parse3.cpp >>>>>>> - I adapted the comments not to point to that comment, and to >>>>>>> reflect the new flagging. Also I mention that we support the >>>>>>> volatile constructor issue, but that it's not standard. >>>>>>> - I protected issuing the barrier for the constructor by PPC64. >>>>>>> I also think it's better to separate these this way. >>>>>> >>>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field >>>>>> declaration and all uses to be guarded by ifdef PPC64 too >>>>>> please. >>>>>> >>>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this >>>>>> comment doesn't make much sense to me and refers to >>>>>> ppc specific stuff in a shared file: >>>>>> >>>>>> if (is_volatile) { >>>>>> ! if (!is_store) { >>>>>> insert_mem_bar(Op_MemBarAcquire); >>>>>> ! } else { >>>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. >>>>>> insert_mem_bar(Op_MemBarVolatile); >>>>>> + #endif >>>>>> + } >>>>>> >>>>>> I don't think the comment is needed. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>>> Thanks for your comments! >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz. >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>> Sent: Mittwoch, 15. Januar 2014 01:55 >>>>>>> To: Lindenmaier, Goetz >>>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; >>>>>>> 'hotspot-dev at openjdk.java.net' >>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>> Independent Reads of Independent Writes >>>>>>> >>>>>>> Hi Goetz, >>>>>>> >>>>>>> Sorry for the delay in getting back to this. >>>>>>> >>>>>>> The general changes to the volatile barriers to support IRIW are okay. >>>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >>>>>>> specifically it is >>>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >>>>>>> the commentary excessive, particularly for shared code. In particular >>>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the >>>>>>> explanation and I don't think we need it to that level of detail. >>>>>>> Seems >>>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >>>>>>> >>>>>>> The changes related to volatile writes in the constructor, as >>>>>>> discussed >>>>>>> are not required by the Java Memory Model. If you want to keep these >>>>>>> then I think they should all be guarded with PPC64 because it is not >>>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >>>>>>> PPC64 porters. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I updated this webrev. I detected a small flaw I made when editing >>>>>>>> this version. >>>>>>>> The #endif in line 322, parse3.cpp was in the wrong line. >>>>>>>> I also based the webrev on the latest version of the stage repo. >>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Goetz. >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Lindenmaier, Goetz >>>>>>>> Sent: Freitag, 20. Dezember 2013 13:47 >>>>>>>> To: David Holmes >>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> Hi David, >>>>>>>> >>>>>>>>> So we can at least undo #4 now we have established those tests were >>>>>>>>> not >>>>>>>>> required to pass. >>>>>>>> We would prefer if we could keep this in. We want to avoid that it's >>>>>>>> blamed on the VM if java programs are failing on PPC after they >>>>>>>> worked >>>>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard >>>>>>>> it by >>>>>>>> a flag as proposed. But if you insist I will remove it. Also, this >>>>>>>> part is >>>>>>>> not that performance relevant. >>>>>>>> >>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>>> think >>>>>>>> I added a compile-time guard in this new webrev: >>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>>>>>>> several double negations I don't like, (#ifNdef >>>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC) >>>>>>>> but this way I only have to change the ppc platform. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Goetz >>>>>>>> >>>>>>>> P.S.: I will also be available over the Christmas period. >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>> Sent: Freitag, 20. Dezember 2013 05:58 >>>>>>>> To: Lindenmaier, Goetz >>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> Sorry for the delay, it takes a while to catch up after two weeks >>>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check >>>>>>>> emails. >>>>>>>> >>>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> ok, I understand the tests are wrong. It's good this issue is >>>>>>>>> settled. >>>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>>>>>>> >>>>>>>>> About our change: David, the causality is the other way round. >>>>>>>>> The change is about IRIW. >>>>>>>>> 1. To pass IRIW, we must use sync instructions before loads. >>>>>>>> >>>>>>>> This is the part I still have some question marks over as the >>>>>>>> implications are not nice for performance on non-TSO platforms. >>>>>>>> But I'm >>>>>>>> no further along in processing that paper I'm afraid. >>>>>>>> >>>>>>>>> 2. If we do syncs before loads, we don't need to do them after >>>>>>>>> stores. >>>>>>>>> 3. If we don't do them after stores, we fail the volatile >>>>>>>>> constructor tests. >>>>>>>>> 4. So finally we added them again at the end of the constructor >>>>>>>>> after stores >>>>>>>>> to pass the volatile constructor tests. >>>>>>>> >>>>>>>> So we can at least undo #4 now we have established those tests >>>>>>>> were not >>>>>>>> required to pass. >>>>>>>> >>>>>>>>> We originally passed the constructor tests because the ppc memory >>>>>>>>> order >>>>>>>>> instructions are not as find-granular as the >>>>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad. >>>>>>>>> The only instruction >>>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, >>>>>>>>> therefore the >>>>>>>>> MemBarVolatile after the store fixes the constructor tests. The >>>>>>>>> proper representation >>>>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now >>>>>>>>> it's pointless >>>>>>>>> anyways. >>>>>>>>> >>>>>>>>>> I'm not happy with the ifdef approach but I won't block it. >>>>>>>>> I'd be happy to add a property >>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>> >>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>> think >>>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>>>>>>> based not architecture based) as that will allows for turning this >>>>>>>> on/off for any architecture for testing purposes. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>>>> or the like to guard the customization. I'd like that much better. >>>>>>>>> Or also >>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>> >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Goetz. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>> Sent: Donnerstag, 28. November 2013 00:34 >>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> TL;DR version: >>>>>>>>> >>>>>>>>> Discussion on the c-i list has now confirmed that a >>>>>>>>> constructor-barrier >>>>>>>>> for volatiles is not required as part of the JMM specification. It >>>>>>>>> *may* >>>>>>>>> be required in an implementation that doesn't pre-zero memory to >>>>>>>>> ensure >>>>>>>>> you can't see uninitialized fields. So the tests for this are >>>>>>>>> invalid >>>>>>>>> and this part of the patch is not needed in general (ppc64 may >>>>>>>>> need it >>>>>>>>> due to other factors). >>>>>>>>> >>>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the >>>>>>>>> term :) >>>>>>>>> Second thanks for the reference to that paper! For reference: >>>>>>>>> >>>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>>>>>>> complex interconnect) does not guarantee that a write becomes >>>>>>>>> visible to >>>>>>>>> all other hardware threads at the same time point; these >>>>>>>>> architectures >>>>>>>>> are not multiple-copy atomic." >>>>>>>>> >>>>>>>>> This is the visibility issue that I referred to and affects both >>>>>>>>> ARM and >>>>>>>>> PPC. But of course it is normally handled by using suitable barriers >>>>>>>>> after the stores that need to be visible. I think the crux of the >>>>>>>>> current issue is what you wrote below: >>>>>>>>> >>>>>>>>> > The fixes for the constructor issue are only needed because we >>>>>>>>> > remove the sync instruction from behind stores >>>>>>>>> (parse3.cpp:320) >>>>>>>>> > and place it before loads. >>>>>>>>> >>>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync >>>>>>>>> after >>>>>>>>> the store then you have to do something around the loads to get the >>>>>>>>> same >>>>>>>>> results! I still don't know what lead you to the conclusion that the >>>>>>>>> only way to fix the IRIW issue was to put the fence before the >>>>>>>>> load - >>>>>>>>> maybe when I get the chance to read that paper in full it will be >>>>>>>>> clearer. >>>>>>>>> >>>>>>>>> So ... the basic problem is that the current structure in the VM has >>>>>>>>> hard-wired one choice of how to get the right semantics for volatile >>>>>>>>> variables. You now want to customize that but not all the requisite >>>>>>>>> hooks are present. It would be better if volatile_load and >>>>>>>>> volatile_store were factored out so that they could be >>>>>>>>> implemented as >>>>>>>>> desired per-platform. Alternatively there could be pre- and post- >>>>>>>>> hooks >>>>>>>>> that could then be customized per platform. Otherwise you need >>>>>>>>> platform-specific ifdef's to handle it as per your patch. >>>>>>>>> >>>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think >>>>>>>>> this >>>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier >>>>>>>>> abstractions are a confused mess in my opinion. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>> >>>>>>>>>> I did not yet add the >>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>> or >>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>>>> to reduce #defined, as I got no further comment on that. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>>>>>>> I feel not in the position to contribute substantially. >>>>>>>>>> >>>>>>>>>> But we would like to pass the torture test suite as we consider >>>>>>>>>> this a substantial task in implementing a PPC port. Also we think >>>>>>>>>> both tests show behavior a programmer would expect. It's bad if >>>>>>>>>> Java code runs fine on the more common x86 platform, and then >>>>>>>>>> fails on ppc. This will always first be blamed on the VM. >>>>>>>>>> >>>>>>>>>> The fixes for the constructor issue are only needed because we >>>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>>>>>>> and place it before loads. Then there is no sync between volatile >>>>>>>>>> store >>>>>>>>>> and publishing the object. So we add it again in this one case >>>>>>>>>> (volatile store in constructor). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> @David >>>>>>>>>>>> Sure. There also is no solution as you require for the >>>>>>>>>>>> taskqueue problem yet, >>>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>>> continuous. >>>>>>>>>> That's not true, we did a lot of investigation and testing on this >>>>>>>>>> issue. >>>>>>>>>> And we came up with a solution we consider the best possible. If >>>>>>>>>> you >>>>>>>>>> have objections, you should at least give the draft of a better >>>>>>>>>> solution, >>>>>>>>>> we would volunteer to implement and test it. >>>>>>>>>> Similarly, we invested time in fixing the concurrency torture >>>>>>>>>> issues. >>>>>>>>>> >>>>>>>>>> @David >>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term >>>>>>>>>>> and >>>>>>>>>>> can't find any reference to it. >>>>>>>>>> We learned about this reading "A Tutorial Introduction to the >>>>>>>>>> ARM and >>>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient >>>>>>>>>> Work-Stealing for >>>>>>>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the >>>>>>>>>> taskqueue problem. >>>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>>>>>> >>>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I >>>>>>>>>> used 'read' >>>>>>>>>> instead. Sorry for that. (I also fixed that in the method name >>>>>>>>>> above). >>>>>>>>>> >>>>>>>>>> Best regards and thanks for all your involvements, >>>>>>>>>> Goetz. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>> >>>>>>>>>> Hi Goetz, >>>>>>>>>> >>>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>> Hi David, >>>>>>>>>>> >>>>>>>>>>> -- Volatile in constuctor >>>>>>>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>>>>>>> missing constructor barrier. >>>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32 >>>>>>>>>>> processors >>>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks >>>>>>>>>>> Aleksey!) >>>>>>>>>> >>>>>>>>>> And see follow ups - the tests are invalid. >>>>>>>>>> >>>>>>>>>>> -- IRIW issue >>>>>>>>>>>> I can not possibly answer to the necessary level of detail with >>>>>>>>>>>> a few >>>>>>>>>>>> moments thought. >>>>>>>>>>> Sure. There also is no solution as you require for the taskqueue >>>>>>>>>>> problem yet, >>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>> >>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>> continuous. >>>>>>>>>> >>>>>>>>>>>> You are implying there is a problem here that will >>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>>> different?) >>>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore >>>>>>>>>>> I contributed a >>>>>>>>>>> solution with the #defines, and that's correct for all, but not >>>>>>>>>>> nice, I admit. >>>>>>>>>>> (I don't really know about ARM, though). >>>>>>>>>>> So if I can write down a nicer solution testing for methods that >>>>>>>>>>> are evaluated >>>>>>>>>>> by the C-compiler I'm happy. >>>>>>>>>>> >>>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the >>>>>>>>>>> problem >>>>>>>>>>> is that >>>>>>>>>>> store >>>>>>>>>>> sync >>>>>>>>>>> does not assure multiple-read-atomicity, >>>>>>>>>>> only >>>>>>>>>>> sync >>>>>>>>>>> load >>>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>>>>>>> pass that test. >>>>>>>>>> >>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the >>>>>>>>>> term and >>>>>>>>>> can't find any reference to it. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>> The JMM is fine. And >>>>>>>>>>> store >>>>>>>>>>> MemBarVolatile >>>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions >>>>>>>>>>> that >>>>>>>>>>> do what is required. >>>>>>>>>>> >>>>>>>>>>> So if you are off soon, please let's come to a solution that >>>>>>>>>>> might be improvable in the way it's implemented, but that >>>>>>>>>>> allows us to implement a correct PPC64 port. >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Goetz. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; >>>>>>>>>>> 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>> >>>>>>>>>>> Hi Goetz, >>>>>>>>>>> >>>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>>> Hi everybody, >>>>>>>>>>>> >>>>>>>>>>>> thanks a lot for the detailed reviews! >>>>>>>>>>>> I'll try to answer to all in one mail. >>>>>>>>>>>> >>>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM >>>>>>>>>>>>> to occur before the reference is assigned; >>>>>>>>>>>> We don't think it's correct if we omit the barrier after >>>>>>>>>>>> initializing >>>>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey >>>>>>>>>>>> Shipilev >>>>>>>>>>>> and Doug Lea, and they agreed. >>>>>>>>>>>> Also, concurrency torture tests >>>>>>>>>>>> LongVolatileTest >>>>>>>>>>>> AtomicIntegerInitialValueTest >>>>>>>>>>>> will fail. >>>>>>>>>>>> (In addition, observing 0 instead of the inital value of a >>>>>>>>>>>> volatile field would be >>>>>>>>>>>> very counter-intuitive for Java programmers, especially in >>>>>>>>>>>> AtomicInteger.) >>>>>>>>>>> >>>>>>>>>>> The affects of unsafe publication are always surprising - >>>>>>>>>>> volatiles do >>>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM >>>>>>>>>>> that >>>>>>>>>>> requires the constructor barrier - discussions with Doug and >>>>>>>>>>> Aleksey >>>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>>>>>>> missing constructor barrier. >>>>>>>>>>> >>>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely >>>>>>>>>>>>> heavyweight >>>>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We >>>>>>>>>>>> don't >>>>>>>>>>>> see a way to implement this cheaper. >>>>>>>>>>>> >>>>>>>>>>>>> - these algorithms should be expressed using the correct >>>>>>>>>>>>> OrderAccess operations >>>>>>>>>>>> Basically, I agree on this. But you also have to take into >>>>>>>>>>>> account >>>>>>>>>>>> that due to the different memory ordering instructions on >>>>>>>>>>>> different platforms >>>>>>>>>>>> just implementing something empty is not sufficient. >>>>>>>>>>>> An example: >>>>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>>>>>>> If these are consecutively in the code, sparc code looks like >>>>>>>>>>>> this: >>>>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore | >>>>>>>>>>>> Assembler::StoreStore) >>>>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>>>>>>> Just doing what is required. >>>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>>>>>>> fine grained operations: >>>>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore, >>>>>>>>>>>> StoreStore, LoadLoad >>>>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore, >>>>>>>>>>>> StoreStore, LoadLoad, StoreLoad >>>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations >>>>>>>>>>>> are more (too) powerful, >>>>>>>>>>>> I need an additional optimization that removes the lwsync. I >>>>>>>>>>>> can not implement >>>>>>>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>>>>>>> >>>>>>>>>>>> Back to the IRIW problem. I think here we have a comparable >>>>>>>>>>>> issue. >>>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the >>>>>>>>>>>> read >>>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>>>>>>> >>>>>>>>>>>> I would propose to guard the code by >>>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>>>>>>> Else, David, how would you propose to implement this platform >>>>>>>>>>>> independent? >>>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>>>>>>> >>>>>>>>>>> I can not possibly answer to the necessary level of detail with a >>>>>>>>>>> few >>>>>>>>>>> moments thought. You are implying there is a problem here that >>>>>>>>>>> will >>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>> different?) and I can not take that on face value at the >>>>>>>>>>> moment. The >>>>>>>>>>> only reason I can see IRIW not being handled by the JMM >>>>>>>>>>> requirements for >>>>>>>>>>> volatile accesses is if there are global visibility issues that >>>>>>>>>>> are not >>>>>>>>>>> addressed - but even then I would expect heavy barriers at the >>>>>>>>>>> store >>>>>>>>>>> would deal with that, not at the load. (This situation reminds me >>>>>>>>>>> of the >>>>>>>>>>> need for read-barriers on Alpha architecture due to the use of >>>>>>>>>>> software >>>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we >>>>>>>>>>> don't have >>>>>>>>>>> that on ppc!) >>>>>>>>>>> >>>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days >>>>>>>>>>> I will >>>>>>>>>>> be heading out on vacation for two weeks. >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> Goetz. >>>>>>>>>>>> >>>>>>>>>>>> -- Other ports: >>>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so >>>>>>>>>>>> it might >>>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of >>>>>>>>>>>> them). >>>>>>>>>>>> >>>>>>>>>>>> -- MemBarStoreStore after initialization >>>>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I >>>>>>>>>>>> can >>>>>>>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>>>>>>> To: Vladimir Kozlov >>>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>>> >>>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable >>>>>>>>>>>> way :) >>>>>>>>>>>> >>>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>>>>>>> I have to ask David to do correctness evaluation. >>>>>>>>>>>> >>>>>>>>>>>> From what I understand what we see here is an attempt to >>>>>>>>>>>> fix an >>>>>>>>>>>> existing issue with the implementation of volatiles so that the >>>>>>>>>>>> IRIW >>>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when >>>>>>>>>>>> doing the >>>>>>>>>>>> load. >>>>>>>>>>>> >>>>>>>>>>>> Now if this was purely handled in ppc64 source code then I >>>>>>>>>>>> would be >>>>>>>>>>>> happy to let them do whatever they like (surely this kills >>>>>>>>>>>> performance >>>>>>>>>>>> though!). But I do not agree with the changes to the shared code >>>>>>>>>>>> that >>>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY >>>>>>>>>>>> this is >>>>>>>>>>>> polluting the shared code. My concern is similar to what I said >>>>>>>>>>>> with the >>>>>>>>>>>> taskQueue changes - these algorithms should be expressed using >>>>>>>>>>>> the >>>>>>>>>>>> correct OrderAccess operations to guarantee the desired >>>>>>>>>>>> properties >>>>>>>>>>>> independent of architecture. If such a "barrier" is not needed >>>>>>>>>>>> on a >>>>>>>>>>>> given architecture then the implementation in OrderAccess should >>>>>>>>>>>> reduce >>>>>>>>>>>> to a no-op. >>>>>>>>>>>> >>>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed >>>>>>>>>>>> under >>>>>>>>>>>> the JMM. >>>>>>>>>>>> >>>>>>>>>>>>> I am fine with suggested changes because you did not change our >>>>>>>>>>>>> current >>>>>>>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>>>>>>> But may be it should be done using more general query which >>>>>>>>>>>>> is set >>>>>>>>>>>>> depending on platform: >>>>>>>>>>>>> >>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>>>> >>>>>>>>>>>>> or similar to what we use now: >>>>>>>>>>>>> >>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>>> >>>>>>>>>>>> Every platform has to support IRIW this is simply part of the >>>>>>>>>>>> Java >>>>>>>>>>>> Memory Model, there should not be any need to call this out >>>>>>>>>>>> explicitly >>>>>>>>>>>> like this. >>>>>>>>>>>> >>>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are >>>>>>>>>>>> there >>>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM >>>>>>>>>>>> defines? >>>>>>>>>>>>> From what I understand our ppc port is also affected. >>>>>>>>>>>>> David? >>>>>>>>>>>> >>>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>>>>>>> >>>>>>>>>>>> David >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be >>>>>>>>>>>>> inside else {}. >>>>>>>>>>>>> >>>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64 >>>>>>>>>>>>> specific which >>>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need >>>>>>>>>>>>> PPC64_ONLY() >>>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many >>>>>>>>>>>>> #ifdefs. >>>>>>>>>>>>> >>>>>>>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>>>>>>> >>>>>>>>>>>>> if (is_vol) { >>>>>>>>>>>>> // See comment in do_get_xxx(). >>>>>>>>>>>>> #ifndef PPC64 >>>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>>>>>>> #else >>>>>>>>>>>>> if (is_field) { >>>>>>>>>>>>> // Add MemBarRelease for constructors which write >>>>>>>>>>>>> volatile field >>>>>>>>>>>>> (PPC64). >>>>>>>>>>>>> set_wrote_volatile(true); >>>>>>>>>>>>> } >>>>>>>>>>>>> #endif >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Vladimir >>>>>>>>>>>>> >>>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the >>>>>>>>>>>>>> VolatileIRIWTest of >>>>>>>>>>>>>> the torture test suite: >>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> Example: >>>>>>>>>>>>>> volatile x=0, y=0 >>>>>>>>>>>>>> __________ __________ __________ __________ >>>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>>>>>>> >>>>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>>>>>>> read(y) read(x) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This >>>>>>>>>>>>>> is only >>>>>>>>>>>>>> assured by the sync instruction and if it is executed in the >>>>>>>>>>>>>> threads >>>>>>>>>>>>>> doing the loads. Thus we implement volatile read as >>>>>>>>>>>>>> sync-load-acquire >>>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple >>>>>>>>>>>>>> SIGSEGVs >>>>>>>>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Further this change contains a fix that assures that volatile >>>>>>>>>>>>>> fields >>>>>>>>>>>>>> written in constructors are visible before the reference gets >>>>>>>>>>>>>> published. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us, >>>>>>>>>>>>>> seems too >>>>>>>>>>>>>> strong. >>>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should >>>>>>>>>>>>>> suffice. >>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please review and test this change. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>> Goetz. >>>>>>>>>>>>>> From goetz.lindenmaier at sap.com Tue Jan 21 05:19:28 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 21 Jan 2014 13:19:28 +0000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <52DE5FB0.5000808@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <52948FF1.5080300@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6C554@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> <52D78697.2090408@oracle.com> <52D79982.4060100@oracle.com> <52D79E61.1060801@oracle.com> <52D7A0A9.6070208@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8CF70@DEWDFEMB12A.global.corp.sap> <52DDFD9D.3050205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8EBA7@DEWDFEMB12A.global.corp.sap> <52DE5FB0.5000808@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8EC55@DEWDFEMB12A.global.corp.sap> Sorry, I missed that. fixed. Best regards, Goetz. -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Dienstag, 21. Januar 2014 12:53 To: Lindenmaier, Goetz; Vladimir Kozlov Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes Thanks Goetz! This typo still exists: + bool _wrote_volatile; // Did we write a final field? s/final/volatile/ Otherwise no further comments from me. David On 21/01/2014 7:22 PM, Lindenmaier, Goetz wrote: > Hi, > > I made a new webrev > http://cr.openjdk.java.net/~goetz/webrevs/8029101-3-raw/ > differing from > http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ > only in the comments. > > I removed > // Support ordering of "Independent Reads of Independent Writes". > everywhere, and edited the comments in the globalDefinition*.hpp > files. > > Best regards, > Goetz. > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Dienstag, 21. Januar 2014 05:55 > To: Lindenmaier, Goetz; Vladimir Kozlov > Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > Hi Goetz, > > On 17/01/2014 6:39 PM, Lindenmaier, Goetz wrote: >> Hi, >> >> I tried to come up with a webrev that implements the change as proposed in >> your mails: >> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ >> >> Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use >> support_IRIW_for_not_multiple_copy_atomic_cpu. > > Given the flag name the commentary eg: > > + // Support ordering of "Independent Reads of Independent Writes". > + if (support_IRIW_for_not_multiple_copy_atomic_cpu) { > > seems somewhat redundant. > >> I left the definition and handling of _wrote_volatile in the code, without >> any protection. > > + bool _wrote_volatile; // Did we write a final field? > > s/final/volatile > >> I protected issuing the barrier for volatile in constructors with PPC64_ONLY() , >> and put it on one line. >> >> I removed the comment in library_call.cpp. >> I also removed the sentence " Solution: implement volatile read as sync-load-acquire." >> from the comments as it's PPC specific. > > I think the primary IRIW comment/explanation should go in > globalDefinitions.hpp where > support_IRIW_for_not_multiple_copy_atomic_cpu is defined. > >> Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these >> issues in C1 if nobody did it by then. > > I've filed: > > https://bugs.openjdk.java.net/browse/JDK-8032366 > > "Implement C1 support for IRIW conformance on non-multiple-copy-atomic > platforms" > > to cover this task, as it may be needed sooner rather than later. > >> Wrt. to performance: Oracle will soon do heavy testing of the port. If any >> performance problems arise, we still can add #ifdef PPC64 to circumvent this. > > Ok. > > Thanks, > David > >> Best regards, >> Goetz. >> >> >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Donnerstag, 16. Januar 2014 10:05 >> To: Vladimir Kozlov >> Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> On 16/01/2014 6:54 PM, Vladimir Kozlov wrote: >>> On 1/16/14 12:34 AM, David Holmes wrote: >>>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote: >>>>> This is becoming ugly #ifdef mess. In compiler code we are trying to >>>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I >>>>> want to keep it this way, it could be useful to have such info on other >>>>> platforms too. But I would suggest to remove PPC64 comments in >>>>> parse.hpp. >>>>> >>>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value >>>>> which could be checked in all places instead of #ifdef: >>>> >>>> I asked for the ifdef some time back as I find it much preferable to >>>> have this as a build-time construct rather than a >>>> runtime one. I don't want to have to pay anything for this if we don't >>>> use it. >>> >>> Any decent C++ compiler will optimize expressions with such constants >>> defined in header files. I insist to avoid #ifdefs in C2 code. I really >>> don't like the code with #ifdef in unsafe.cpp but I can live with it. >> >> If you insist then we may as well do it all the same way. Better to be >> consistent. >> >> My apologies Goetz for wasting your time going back and forth on this. >> >> That aside I have a further concern with this IRIW support - it is >> incomplete as there is no C1 support, as PPC64 isn't using client. If >> this is going on then we (which probably means the Oracle 'we') need to >> add the missing C1 code. >> >> David >> ----- >> >>> Vladimir >>> >>>> >>>> David >>>> >>>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; >>>>> #else >>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; >>>>> #endif >>>>> >>>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever >>>>> >>>>> and then: >>>>> >>>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ >>>>> oop p = JNIHandles::resolve(obj); \ >>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) >>>>> OrderAccess::fence(); \ >>>>> volatile type_name v = OrderAccess::load_acquire((volatile >>>>> type_name*)index_oop_from_field_offset_long(p, offset)); >>>>> >>>>> And: >>>>> >>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu && >>>>> field->is_volatile()) { >>>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier >>>>> + } >>>>> >>>>> And so on. The comments will be needed only in globalDefinitions.hpp >>>>> >>>>> The code in parse1.cpp could be put on one line: >>>>> >>>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && >>>>> method()->is_initializer()) )) { >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 1/15/14 9:25 PM, David Holmes wrote: >>>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >>>>>>> Hi David, >>>>>>> >>>>>>> I updated the webrev: >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>> >>>>>>> - I removed the IRIW example in parse3.cpp >>>>>>> - I adapted the comments not to point to that comment, and to >>>>>>> reflect the new flagging. Also I mention that we support the >>>>>>> volatile constructor issue, but that it's not standard. >>>>>>> - I protected issuing the barrier for the constructor by PPC64. >>>>>>> I also think it's better to separate these this way. >>>>>> >>>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field >>>>>> declaration and all uses to be guarded by ifdef PPC64 too >>>>>> please. >>>>>> >>>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this >>>>>> comment doesn't make much sense to me and refers to >>>>>> ppc specific stuff in a shared file: >>>>>> >>>>>> if (is_volatile) { >>>>>> ! if (!is_store) { >>>>>> insert_mem_bar(Op_MemBarAcquire); >>>>>> ! } else { >>>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. >>>>>> insert_mem_bar(Op_MemBarVolatile); >>>>>> + #endif >>>>>> + } >>>>>> >>>>>> I don't think the comment is needed. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>>> Thanks for your comments! >>>>>>> >>>>>>> Best regards, >>>>>>> Goetz. >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>> Sent: Mittwoch, 15. Januar 2014 01:55 >>>>>>> To: Lindenmaier, Goetz >>>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; >>>>>>> 'hotspot-dev at openjdk.java.net' >>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>> Independent Reads of Independent Writes >>>>>>> >>>>>>> Hi Goetz, >>>>>>> >>>>>>> Sorry for the delay in getting back to this. >>>>>>> >>>>>>> The general changes to the volatile barriers to support IRIW are okay. >>>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >>>>>>> specifically it is >>>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >>>>>>> the commentary excessive, particularly for shared code. In particular >>>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the >>>>>>> explanation and I don't think we need it to that level of detail. >>>>>>> Seems >>>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >>>>>>> >>>>>>> The changes related to volatile writes in the constructor, as >>>>>>> discussed >>>>>>> are not required by the Java Memory Model. If you want to keep these >>>>>>> then I think they should all be guarded with PPC64 because it is not >>>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >>>>>>> PPC64 porters. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I updated this webrev. I detected a small flaw I made when editing >>>>>>>> this version. >>>>>>>> The #endif in line 322, parse3.cpp was in the wrong line. >>>>>>>> I also based the webrev on the latest version of the stage repo. >>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Goetz. >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Lindenmaier, Goetz >>>>>>>> Sent: Freitag, 20. Dezember 2013 13:47 >>>>>>>> To: David Holmes >>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> Hi David, >>>>>>>> >>>>>>>>> So we can at least undo #4 now we have established those tests were >>>>>>>>> not >>>>>>>>> required to pass. >>>>>>>> We would prefer if we could keep this in. We want to avoid that it's >>>>>>>> blamed on the VM if java programs are failing on PPC after they >>>>>>>> worked >>>>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard >>>>>>>> it by >>>>>>>> a flag as proposed. But if you insist I will remove it. Also, this >>>>>>>> part is >>>>>>>> not that performance relevant. >>>>>>>> >>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>>> think >>>>>>>> I added a compile-time guard in this new webrev: >>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>>>>>>> several double negations I don't like, (#ifNdef >>>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC) >>>>>>>> but this way I only have to change the ppc platform. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Goetz >>>>>>>> >>>>>>>> P.S.: I will also be available over the Christmas period. >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>> Sent: Freitag, 20. Dezember 2013 05:58 >>>>>>>> To: Lindenmaier, Goetz >>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> Sorry for the delay, it takes a while to catch up after two weeks >>>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check >>>>>>>> emails. >>>>>>>> >>>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> ok, I understand the tests are wrong. It's good this issue is >>>>>>>>> settled. >>>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>>>>>>> >>>>>>>>> About our change: David, the causality is the other way round. >>>>>>>>> The change is about IRIW. >>>>>>>>> 1. To pass IRIW, we must use sync instructions before loads. >>>>>>>> >>>>>>>> This is the part I still have some question marks over as the >>>>>>>> implications are not nice for performance on non-TSO platforms. >>>>>>>> But I'm >>>>>>>> no further along in processing that paper I'm afraid. >>>>>>>> >>>>>>>>> 2. If we do syncs before loads, we don't need to do them after >>>>>>>>> stores. >>>>>>>>> 3. If we don't do them after stores, we fail the volatile >>>>>>>>> constructor tests. >>>>>>>>> 4. So finally we added them again at the end of the constructor >>>>>>>>> after stores >>>>>>>>> to pass the volatile constructor tests. >>>>>>>> >>>>>>>> So we can at least undo #4 now we have established those tests >>>>>>>> were not >>>>>>>> required to pass. >>>>>>>> >>>>>>>>> We originally passed the constructor tests because the ppc memory >>>>>>>>> order >>>>>>>>> instructions are not as find-granular as the >>>>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad. >>>>>>>>> The only instruction >>>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, >>>>>>>>> therefore the >>>>>>>>> MemBarVolatile after the store fixes the constructor tests. The >>>>>>>>> proper representation >>>>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now >>>>>>>>> it's pointless >>>>>>>>> anyways. >>>>>>>>> >>>>>>>>>> I'm not happy with the ifdef approach but I won't block it. >>>>>>>>> I'd be happy to add a property >>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>> >>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>> think >>>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>>>>>>> based not architecture based) as that will allows for turning this >>>>>>>> on/off for any architecture for testing purposes. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>>>> or the like to guard the customization. I'd like that much better. >>>>>>>>> Or also >>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>> >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Goetz. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>> Sent: Donnerstag, 28. November 2013 00:34 >>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> TL;DR version: >>>>>>>>> >>>>>>>>> Discussion on the c-i list has now confirmed that a >>>>>>>>> constructor-barrier >>>>>>>>> for volatiles is not required as part of the JMM specification. It >>>>>>>>> *may* >>>>>>>>> be required in an implementation that doesn't pre-zero memory to >>>>>>>>> ensure >>>>>>>>> you can't see uninitialized fields. So the tests for this are >>>>>>>>> invalid >>>>>>>>> and this part of the patch is not needed in general (ppc64 may >>>>>>>>> need it >>>>>>>>> due to other factors). >>>>>>>>> >>>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the >>>>>>>>> term :) >>>>>>>>> Second thanks for the reference to that paper! For reference: >>>>>>>>> >>>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>>>>>>> complex interconnect) does not guarantee that a write becomes >>>>>>>>> visible to >>>>>>>>> all other hardware threads at the same time point; these >>>>>>>>> architectures >>>>>>>>> are not multiple-copy atomic." >>>>>>>>> >>>>>>>>> This is the visibility issue that I referred to and affects both >>>>>>>>> ARM and >>>>>>>>> PPC. But of course it is normally handled by using suitable barriers >>>>>>>>> after the stores that need to be visible. I think the crux of the >>>>>>>>> current issue is what you wrote below: >>>>>>>>> >>>>>>>>> > The fixes for the constructor issue are only needed because we >>>>>>>>> > remove the sync instruction from behind stores >>>>>>>>> (parse3.cpp:320) >>>>>>>>> > and place it before loads. >>>>>>>>> >>>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync >>>>>>>>> after >>>>>>>>> the store then you have to do something around the loads to get the >>>>>>>>> same >>>>>>>>> results! I still don't know what lead you to the conclusion that the >>>>>>>>> only way to fix the IRIW issue was to put the fence before the >>>>>>>>> load - >>>>>>>>> maybe when I get the chance to read that paper in full it will be >>>>>>>>> clearer. >>>>>>>>> >>>>>>>>> So ... the basic problem is that the current structure in the VM has >>>>>>>>> hard-wired one choice of how to get the right semantics for volatile >>>>>>>>> variables. You now want to customize that but not all the requisite >>>>>>>>> hooks are present. It would be better if volatile_load and >>>>>>>>> volatile_store were factored out so that they could be >>>>>>>>> implemented as >>>>>>>>> desired per-platform. Alternatively there could be pre- and post- >>>>>>>>> hooks >>>>>>>>> that could then be customized per platform. Otherwise you need >>>>>>>>> platform-specific ifdef's to handle it as per your patch. >>>>>>>>> >>>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think >>>>>>>>> this >>>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier >>>>>>>>> abstractions are a confused mess in my opinion. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>> >>>>>>>>>> I did not yet add the >>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>> or >>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>>>> to reduce #defined, as I got no further comment on that. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>>>>>>> I feel not in the position to contribute substantially. >>>>>>>>>> >>>>>>>>>> But we would like to pass the torture test suite as we consider >>>>>>>>>> this a substantial task in implementing a PPC port. Also we think >>>>>>>>>> both tests show behavior a programmer would expect. It's bad if >>>>>>>>>> Java code runs fine on the more common x86 platform, and then >>>>>>>>>> fails on ppc. This will always first be blamed on the VM. >>>>>>>>>> >>>>>>>>>> The fixes for the constructor issue are only needed because we >>>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>>>>>>> and place it before loads. Then there is no sync between volatile >>>>>>>>>> store >>>>>>>>>> and publishing the object. So we add it again in this one case >>>>>>>>>> (volatile store in constructor). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> @David >>>>>>>>>>>> Sure. There also is no solution as you require for the >>>>>>>>>>>> taskqueue problem yet, >>>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>>> continuous. >>>>>>>>>> That's not true, we did a lot of investigation and testing on this >>>>>>>>>> issue. >>>>>>>>>> And we came up with a solution we consider the best possible. If >>>>>>>>>> you >>>>>>>>>> have objections, you should at least give the draft of a better >>>>>>>>>> solution, >>>>>>>>>> we would volunteer to implement and test it. >>>>>>>>>> Similarly, we invested time in fixing the concurrency torture >>>>>>>>>> issues. >>>>>>>>>> >>>>>>>>>> @David >>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term >>>>>>>>>>> and >>>>>>>>>>> can't find any reference to it. >>>>>>>>>> We learned about this reading "A Tutorial Introduction to the >>>>>>>>>> ARM and >>>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient >>>>>>>>>> Work-Stealing for >>>>>>>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the >>>>>>>>>> taskqueue problem. >>>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>>>>>> >>>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I >>>>>>>>>> used 'read' >>>>>>>>>> instead. Sorry for that. (I also fixed that in the method name >>>>>>>>>> above). >>>>>>>>>> >>>>>>>>>> Best regards and thanks for all your involvements, >>>>>>>>>> Goetz. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>> >>>>>>>>>> Hi Goetz, >>>>>>>>>> >>>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>> Hi David, >>>>>>>>>>> >>>>>>>>>>> -- Volatile in constuctor >>>>>>>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>>>>>>> missing constructor barrier. >>>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32 >>>>>>>>>>> processors >>>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks >>>>>>>>>>> Aleksey!) >>>>>>>>>> >>>>>>>>>> And see follow ups - the tests are invalid. >>>>>>>>>> >>>>>>>>>>> -- IRIW issue >>>>>>>>>>>> I can not possibly answer to the necessary level of detail with >>>>>>>>>>>> a few >>>>>>>>>>>> moments thought. >>>>>>>>>>> Sure. There also is no solution as you require for the taskqueue >>>>>>>>>>> problem yet, >>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>> >>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>> continuous. >>>>>>>>>> >>>>>>>>>>>> You are implying there is a problem here that will >>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>>> different?) >>>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore >>>>>>>>>>> I contributed a >>>>>>>>>>> solution with the #defines, and that's correct for all, but not >>>>>>>>>>> nice, I admit. >>>>>>>>>>> (I don't really know about ARM, though). >>>>>>>>>>> So if I can write down a nicer solution testing for methods that >>>>>>>>>>> are evaluated >>>>>>>>>>> by the C-compiler I'm happy. >>>>>>>>>>> >>>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the >>>>>>>>>>> problem >>>>>>>>>>> is that >>>>>>>>>>> store >>>>>>>>>>> sync >>>>>>>>>>> does not assure multiple-read-atomicity, >>>>>>>>>>> only >>>>>>>>>>> sync >>>>>>>>>>> load >>>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>>>>>>> pass that test. >>>>>>>>>> >>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the >>>>>>>>>> term and >>>>>>>>>> can't find any reference to it. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>> The JMM is fine. And >>>>>>>>>>> store >>>>>>>>>>> MemBarVolatile >>>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions >>>>>>>>>>> that >>>>>>>>>>> do what is required. >>>>>>>>>>> >>>>>>>>>>> So if you are off soon, please let's come to a solution that >>>>>>>>>>> might be improvable in the way it's implemented, but that >>>>>>>>>>> allows us to implement a correct PPC64 port. >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Goetz. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; >>>>>>>>>>> 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>> >>>>>>>>>>> Hi Goetz, >>>>>>>>>>> >>>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>>> Hi everybody, >>>>>>>>>>>> >>>>>>>>>>>> thanks a lot for the detailed reviews! >>>>>>>>>>>> I'll try to answer to all in one mail. >>>>>>>>>>>> >>>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM >>>>>>>>>>>>> to occur before the reference is assigned; >>>>>>>>>>>> We don't think it's correct if we omit the barrier after >>>>>>>>>>>> initializing >>>>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey >>>>>>>>>>>> Shipilev >>>>>>>>>>>> and Doug Lea, and they agreed. >>>>>>>>>>>> Also, concurrency torture tests >>>>>>>>>>>> LongVolatileTest >>>>>>>>>>>> AtomicIntegerInitialValueTest >>>>>>>>>>>> will fail. >>>>>>>>>>>> (In addition, observing 0 instead of the inital value of a >>>>>>>>>>>> volatile field would be >>>>>>>>>>>> very counter-intuitive for Java programmers, especially in >>>>>>>>>>>> AtomicInteger.) >>>>>>>>>>> >>>>>>>>>>> The affects of unsafe publication are always surprising - >>>>>>>>>>> volatiles do >>>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM >>>>>>>>>>> that >>>>>>>>>>> requires the constructor barrier - discussions with Doug and >>>>>>>>>>> Aleksey >>>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>>>>>>> missing constructor barrier. >>>>>>>>>>> >>>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely >>>>>>>>>>>>> heavyweight >>>>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We >>>>>>>>>>>> don't >>>>>>>>>>>> see a way to implement this cheaper. >>>>>>>>>>>> >>>>>>>>>>>>> - these algorithms should be expressed using the correct >>>>>>>>>>>>> OrderAccess operations >>>>>>>>>>>> Basically, I agree on this. But you also have to take into >>>>>>>>>>>> account >>>>>>>>>>>> that due to the different memory ordering instructions on >>>>>>>>>>>> different platforms >>>>>>>>>>>> just implementing something empty is not sufficient. >>>>>>>>>>>> An example: >>>>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>>>>>>> If these are consecutively in the code, sparc code looks like >>>>>>>>>>>> this: >>>>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore | >>>>>>>>>>>> Assembler::StoreStore) >>>>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>>>>>>> Just doing what is required. >>>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>>>>>>> fine grained operations: >>>>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore, >>>>>>>>>>>> StoreStore, LoadLoad >>>>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore, >>>>>>>>>>>> StoreStore, LoadLoad, StoreLoad >>>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations >>>>>>>>>>>> are more (too) powerful, >>>>>>>>>>>> I need an additional optimization that removes the lwsync. I >>>>>>>>>>>> can not implement >>>>>>>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>>>>>>> >>>>>>>>>>>> Back to the IRIW problem. I think here we have a comparable >>>>>>>>>>>> issue. >>>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the >>>>>>>>>>>> read >>>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>>>>>>> >>>>>>>>>>>> I would propose to guard the code by >>>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>>>>>>> Else, David, how would you propose to implement this platform >>>>>>>>>>>> independent? >>>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>>>>>>> >>>>>>>>>>> I can not possibly answer to the necessary level of detail with a >>>>>>>>>>> few >>>>>>>>>>> moments thought. You are implying there is a problem here that >>>>>>>>>>> will >>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>> different?) and I can not take that on face value at the >>>>>>>>>>> moment. The >>>>>>>>>>> only reason I can see IRIW not being handled by the JMM >>>>>>>>>>> requirements for >>>>>>>>>>> volatile accesses is if there are global visibility issues that >>>>>>>>>>> are not >>>>>>>>>>> addressed - but even then I would expect heavy barriers at the >>>>>>>>>>> store >>>>>>>>>>> would deal with that, not at the load. (This situation reminds me >>>>>>>>>>> of the >>>>>>>>>>> need for read-barriers on Alpha architecture due to the use of >>>>>>>>>>> software >>>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we >>>>>>>>>>> don't have >>>>>>>>>>> that on ppc!) >>>>>>>>>>> >>>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days >>>>>>>>>>> I will >>>>>>>>>>> be heading out on vacation for two weeks. >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> Goetz. >>>>>>>>>>>> >>>>>>>>>>>> -- Other ports: >>>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so >>>>>>>>>>>> it might >>>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of >>>>>>>>>>>> them). >>>>>>>>>>>> >>>>>>>>>>>> -- MemBarStoreStore after initialization >>>>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I >>>>>>>>>>>> can >>>>>>>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>>>>>>> To: Vladimir Kozlov >>>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>>> >>>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable >>>>>>>>>>>> way :) >>>>>>>>>>>> >>>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>>>>>>> I have to ask David to do correctness evaluation. >>>>>>>>>>>> >>>>>>>>>>>> From what I understand what we see here is an attempt to >>>>>>>>>>>> fix an >>>>>>>>>>>> existing issue with the implementation of volatiles so that the >>>>>>>>>>>> IRIW >>>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when >>>>>>>>>>>> doing the >>>>>>>>>>>> load. >>>>>>>>>>>> >>>>>>>>>>>> Now if this was purely handled in ppc64 source code then I >>>>>>>>>>>> would be >>>>>>>>>>>> happy to let them do whatever they like (surely this kills >>>>>>>>>>>> performance >>>>>>>>>>>> though!). But I do not agree with the changes to the shared code >>>>>>>>>>>> that >>>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY >>>>>>>>>>>> this is >>>>>>>>>>>> polluting the shared code. My concern is similar to what I said >>>>>>>>>>>> with the >>>>>>>>>>>> taskQueue changes - these algorithms should be expressed using >>>>>>>>>>>> the >>>>>>>>>>>> correct OrderAccess operations to guarantee the desired >>>>>>>>>>>> properties >>>>>>>>>>>> independent of architecture. If such a "barrier" is not needed >>>>>>>>>>>> on a >>>>>>>>>>>> given architecture then the implementation in OrderAccess should >>>>>>>>>>>> reduce >>>>>>>>>>>> to a no-op. >>>>>>>>>>>> >>>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed >>>>>>>>>>>> under >>>>>>>>>>>> the JMM. >>>>>>>>>>>> >>>>>>>>>>>>> I am fine with suggested changes because you did not change our >>>>>>>>>>>>> current >>>>>>>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>>>>>>> But may be it should be done using more general query which >>>>>>>>>>>>> is set >>>>>>>>>>>>> depending on platform: >>>>>>>>>>>>> >>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>>>> >>>>>>>>>>>>> or similar to what we use now: >>>>>>>>>>>>> >>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>>> >>>>>>>>>>>> Every platform has to support IRIW this is simply part of the >>>>>>>>>>>> Java >>>>>>>>>>>> Memory Model, there should not be any need to call this out >>>>>>>>>>>> explicitly >>>>>>>>>>>> like this. >>>>>>>>>>>> >>>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are >>>>>>>>>>>> there >>>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM >>>>>>>>>>>> defines? >>>>>>>>>>>>> From what I understand our ppc port is also affected. >>>>>>>>>>>>> David? >>>>>>>>>>>> >>>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>>>>>>> >>>>>>>>>>>> David >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be >>>>>>>>>>>>> inside else {}. >>>>>>>>>>>>> >>>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64 >>>>>>>>>>>>> specific which >>>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need >>>>>>>>>>>>> PPC64_ONLY() >>>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many >>>>>>>>>>>>> #ifdefs. >>>>>>>>>>>>> >>>>>>>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>>>>>>> >>>>>>>>>>>>> if (is_vol) { >>>>>>>>>>>>> // See comment in do_get_xxx(). >>>>>>>>>>>>> #ifndef PPC64 >>>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>>>>>>> #else >>>>>>>>>>>>> if (is_field) { >>>>>>>>>>>>> // Add MemBarRelease for constructors which write >>>>>>>>>>>>> volatile field >>>>>>>>>>>>> (PPC64). >>>>>>>>>>>>> set_wrote_volatile(true); >>>>>>>>>>>>> } >>>>>>>>>>>>> #endif >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Vladimir >>>>>>>>>>>>> >>>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the >>>>>>>>>>>>>> VolatileIRIWTest of >>>>>>>>>>>>>> the torture test suite: >>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> Example: >>>>>>>>>>>>>> volatile x=0, y=0 >>>>>>>>>>>>>> __________ __________ __________ __________ >>>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>>>>>>> >>>>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>>>>>>> read(y) read(x) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This >>>>>>>>>>>>>> is only >>>>>>>>>>>>>> assured by the sync instruction and if it is executed in the >>>>>>>>>>>>>> threads >>>>>>>>>>>>>> doing the loads. Thus we implement volatile read as >>>>>>>>>>>>>> sync-load-acquire >>>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple >>>>>>>>>>>>>> SIGSEGVs >>>>>>>>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Further this change contains a fix that assures that volatile >>>>>>>>>>>>>> fields >>>>>>>>>>>>>> written in constructors are visible before the reference gets >>>>>>>>>>>>>> published. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us, >>>>>>>>>>>>>> seems too >>>>>>>>>>>>>> strong. >>>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should >>>>>>>>>>>>>> suffice. >>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please review and test this change. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>> Goetz. >>>>>>>>>>>>>> From volker.simonis at gmail.com Tue Jan 21 09:49:07 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 21 Jan 2014 18:49:07 +0100 Subject: 7133499: (fc) FileChannel.read not preempted by asynchronous close on OS X In-Reply-To: <52DD965A.1030505@oracle.com> References: <52DD10A1.3040202@oracle.com> <52DD965A.1030505@oracle.com> Message-ID: On Mon, Jan 20, 2014 at 10:34 PM, Alan Bateman wrote: > On 20/01/2014 19:57, Volker Simonis wrote: >> >> Hi Alan, >> >> I've tried your patch with our port on AIX. >> The good news is that it fixes: >> >> java/nio/channels/AsynchronousFileChannel/Lock.java >> >> on AIX as well. >> >> The bad news is, that it doesn't seem to help for: >> >> java/nio/channels/AsyncCloseAndInterrupt.java >> >> Here's a stack trace of where the VM gets stuck: > > In these stack traces then the channels are Pipe.SourceChannel or > Pipe.SinkChannel where the file descriptor is to one end of a pipe. Are > these the only cases where you see these hangs? I'm interested to know if > the async close of SocketChannel and ServerSocketChannel when configured > blocked also hangs (I will guess that it will as the behavior is likely to > be the same as pipe). I also think they will hang, but I'm not sure how to test it. The java/nio/channels/AsynchronousServerSocketChannel and java/nio/channels/AsynchronousSocketChannel all pass, but I'm not sure if they test the same thing. In the NIO are, I currently (with your change) have problems with the following tests: java/nio/channels/AsyncCloseAndInterrupt.java (hangs) java/nio/channels/AsynchronousChannelGroup/Basic.java (hangs somtimes) java/nio/channels/AsynchronousChannelGroup/GroupOfOne.java (hangs) java/nio/channels/AsynchronousChannelGroup/Unbounded.java (hangs somtimes) java/nio/channels/Selector/RacyDeregister.java (fails) However, java/nio/channels/AsynchronousChannelGroup/Unbounded.java hangs in AixPollPort.pollset_poll() (which is our implementation of AsynchronousChannelGroup) so that may be a completely different problem. I'm currently try to debug it. > I have an idea on how to fix this so that the preClose isn't used when the > channel is configured blocking (or isn't registered with a Selector). The > patch doesn't use NativeThreadSet because it isn't efficient when the number > of threads is limited to 1 or 2 (FileChannel uses NativeThreadSet because it > defines positional read/write and so the number of concurrent reader/writers > is unlimited). > Yes, I'm definitely interested to see and test your patch on AIX. > I'll send a patch the other channels soon and we can see if this works for > you. If it does work then I have a bit of a preference to being it in via > jdk9/dev rather than via the AIX staging forest because the changes impact > all platforms. Yes, that's no problem. I think the class library for AIX will be fine and ready for integration into jdk9/dev without these changes. We can fix that later and backport it to 8u-dev as required. > That said, if we being this FileChannel fix for OSX in then > it means that the platform specific changes would be in, the patch for the > other platforms shouldn't require porting. > > -Alan. > From vladimir.kozlov at oracle.com Tue Jan 21 12:00:18 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 21 Jan 2014 12:00:18 -0800 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE8EC55@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> <52D78697.2090408@oracle.com> <52D79982.4060100@oracle.com> <52D79E61.1060801@oracle.com> <52D7A0A9.6070208@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8CF70@DEWDFEMB12A.global.corp.sap> <52DDFD9D.3050205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8EBA7@DEWDFEMB12A.global.corp.sap> <52DE5FB0.5000808@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8EC55@DEWDFEMB12A.global.corp.sap> Message-ID: <52DED1D2.1070203@oracle.com> Thanks. I am pushing it. Vladimir On 1/21/14 5:19 AM, Lindenmaier, Goetz wrote: > Sorry, I missed that. fixed. > > Best regards, > Goetz. > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Dienstag, 21. Januar 2014 12:53 > To: Lindenmaier, Goetz; Vladimir Kozlov > Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > Thanks Goetz! > > This typo still exists: > > + bool _wrote_volatile; // Did we write a final field? > > s/final/volatile/ > > Otherwise no further comments from me. > > David > > On 21/01/2014 7:22 PM, Lindenmaier, Goetz wrote: >> Hi, >> >> I made a new webrev >> http://cr.openjdk.java.net/~goetz/webrevs/8029101-3-raw/ >> differing from >> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ >> only in the comments. >> >> I removed >> // Support ordering of "Independent Reads of Independent Writes". >> everywhere, and edited the comments in the globalDefinition*.hpp >> files. >> >> Best regards, >> Goetz. >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Dienstag, 21. Januar 2014 05:55 >> To: Lindenmaier, Goetz; Vladimir Kozlov >> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> Hi Goetz, >> >> On 17/01/2014 6:39 PM, Lindenmaier, Goetz wrote: >>> Hi, >>> >>> I tried to come up with a webrev that implements the change as proposed in >>> your mails: >>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ >>> >>> Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use >>> support_IRIW_for_not_multiple_copy_atomic_cpu. >> >> Given the flag name the commentary eg: >> >> + // Support ordering of "Independent Reads of Independent Writes". >> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) { >> >> seems somewhat redundant. >> >>> I left the definition and handling of _wrote_volatile in the code, without >>> any protection. >> >> + bool _wrote_volatile; // Did we write a final field? >> >> s/final/volatile >> >>> I protected issuing the barrier for volatile in constructors with PPC64_ONLY() , >>> and put it on one line. >>> >>> I removed the comment in library_call.cpp. >>> I also removed the sentence " Solution: implement volatile read as sync-load-acquire." >>> from the comments as it's PPC specific. >> >> I think the primary IRIW comment/explanation should go in >> globalDefinitions.hpp where >> support_IRIW_for_not_multiple_copy_atomic_cpu is defined. >> >>> Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these >>> issues in C1 if nobody did it by then. >> >> I've filed: >> >> https://bugs.openjdk.java.net/browse/JDK-8032366 >> >> "Implement C1 support for IRIW conformance on non-multiple-copy-atomic >> platforms" >> >> to cover this task, as it may be needed sooner rather than later. >> >>> Wrt. to performance: Oracle will soon do heavy testing of the port. If any >>> performance problems arise, we still can add #ifdef PPC64 to circumvent this. >> >> Ok. >> >> Thanks, >> David >> >>> Best regards, >>> Goetz. >>> >>> >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Donnerstag, 16. Januar 2014 10:05 >>> To: Vladimir Kozlov >>> Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>> >>> On 16/01/2014 6:54 PM, Vladimir Kozlov wrote: >>>> On 1/16/14 12:34 AM, David Holmes wrote: >>>>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote: >>>>>> This is becoming ugly #ifdef mess. In compiler code we are trying to >>>>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I >>>>>> want to keep it this way, it could be useful to have such info on other >>>>>> platforms too. But I would suggest to remove PPC64 comments in >>>>>> parse.hpp. >>>>>> >>>>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value >>>>>> which could be checked in all places instead of #ifdef: >>>>> >>>>> I asked for the ifdef some time back as I find it much preferable to >>>>> have this as a build-time construct rather than a >>>>> runtime one. I don't want to have to pay anything for this if we don't >>>>> use it. >>>> >>>> Any decent C++ compiler will optimize expressions with such constants >>>> defined in header files. I insist to avoid #ifdefs in C2 code. I really >>>> don't like the code with #ifdef in unsafe.cpp but I can live with it. >>> >>> If you insist then we may as well do it all the same way. Better to be >>> consistent. >>> >>> My apologies Goetz for wasting your time going back and forth on this. >>> >>> That aside I have a further concern with this IRIW support - it is >>> incomplete as there is no C1 support, as PPC64 isn't using client. If >>> this is going on then we (which probably means the Oracle 'we') need to >>> add the missing C1 code. >>> >>> David >>> ----- >>> >>>> Vladimir >>>> >>>>> >>>>> David >>>>> >>>>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; >>>>>> #else >>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; >>>>>> #endif >>>>>> >>>>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever >>>>>> >>>>>> and then: >>>>>> >>>>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ >>>>>> oop p = JNIHandles::resolve(obj); \ >>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) >>>>>> OrderAccess::fence(); \ >>>>>> volatile type_name v = OrderAccess::load_acquire((volatile >>>>>> type_name*)index_oop_from_field_offset_long(p, offset)); >>>>>> >>>>>> And: >>>>>> >>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu && >>>>>> field->is_volatile()) { >>>>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier >>>>>> + } >>>>>> >>>>>> And so on. The comments will be needed only in globalDefinitions.hpp >>>>>> >>>>>> The code in parse1.cpp could be put on one line: >>>>>> >>>>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && >>>>>> method()->is_initializer()) )) { >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 1/15/14 9:25 PM, David Holmes wrote: >>>>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >>>>>>>> Hi David, >>>>>>>> >>>>>>>> I updated the webrev: >>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>> >>>>>>>> - I removed the IRIW example in parse3.cpp >>>>>>>> - I adapted the comments not to point to that comment, and to >>>>>>>> reflect the new flagging. Also I mention that we support the >>>>>>>> volatile constructor issue, but that it's not standard. >>>>>>>> - I protected issuing the barrier for the constructor by PPC64. >>>>>>>> I also think it's better to separate these this way. >>>>>>> >>>>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field >>>>>>> declaration and all uses to be guarded by ifdef PPC64 too >>>>>>> please. >>>>>>> >>>>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this >>>>>>> comment doesn't make much sense to me and refers to >>>>>>> ppc specific stuff in a shared file: >>>>>>> >>>>>>> if (is_volatile) { >>>>>>> ! if (!is_store) { >>>>>>> insert_mem_bar(Op_MemBarAcquire); >>>>>>> ! } else { >>>>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. >>>>>>> insert_mem_bar(Op_MemBarVolatile); >>>>>>> + #endif >>>>>>> + } >>>>>>> >>>>>>> I don't think the comment is needed. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>>> Thanks for your comments! >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Goetz. >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>> Sent: Mittwoch, 15. Januar 2014 01:55 >>>>>>>> To: Lindenmaier, Goetz >>>>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; >>>>>>>> 'hotspot-dev at openjdk.java.net' >>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> Hi Goetz, >>>>>>>> >>>>>>>> Sorry for the delay in getting back to this. >>>>>>>> >>>>>>>> The general changes to the volatile barriers to support IRIW are okay. >>>>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >>>>>>>> specifically it is >>>>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >>>>>>>> the commentary excessive, particularly for shared code. In particular >>>>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the >>>>>>>> explanation and I don't think we need it to that level of detail. >>>>>>>> Seems >>>>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >>>>>>>> >>>>>>>> The changes related to volatile writes in the constructor, as >>>>>>>> discussed >>>>>>>> are not required by the Java Memory Model. If you want to keep these >>>>>>>> then I think they should all be guarded with PPC64 because it is not >>>>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >>>>>>>> PPC64 porters. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I updated this webrev. I detected a small flaw I made when editing >>>>>>>>> this version. >>>>>>>>> The #endif in line 322, parse3.cpp was in the wrong line. >>>>>>>>> I also based the webrev on the latest version of the stage repo. >>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Goetz. >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Lindenmaier, Goetz >>>>>>>>> Sent: Freitag, 20. Dezember 2013 13:47 >>>>>>>>> To: David Holmes >>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>>> So we can at least undo #4 now we have established those tests were >>>>>>>>>> not >>>>>>>>>> required to pass. >>>>>>>>> We would prefer if we could keep this in. We want to avoid that it's >>>>>>>>> blamed on the VM if java programs are failing on PPC after they >>>>>>>>> worked >>>>>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard >>>>>>>>> it by >>>>>>>>> a flag as proposed. But if you insist I will remove it. Also, this >>>>>>>>> part is >>>>>>>>> not that performance relevant. >>>>>>>>> >>>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>>>> think >>>>>>>>> I added a compile-time guard in this new webrev: >>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>>>>>>>> several double negations I don't like, (#ifNdef >>>>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC) >>>>>>>>> but this way I only have to change the ppc platform. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Goetz >>>>>>>>> >>>>>>>>> P.S.: I will also be available over the Christmas period. >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>> Sent: Freitag, 20. Dezember 2013 05:58 >>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> Sorry for the delay, it takes a while to catch up after two weeks >>>>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check >>>>>>>>> emails. >>>>>>>>> >>>>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> ok, I understand the tests are wrong. It's good this issue is >>>>>>>>>> settled. >>>>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>>>>>>>> >>>>>>>>>> About our change: David, the causality is the other way round. >>>>>>>>>> The change is about IRIW. >>>>>>>>>> 1. To pass IRIW, we must use sync instructions before loads. >>>>>>>>> >>>>>>>>> This is the part I still have some question marks over as the >>>>>>>>> implications are not nice for performance on non-TSO platforms. >>>>>>>>> But I'm >>>>>>>>> no further along in processing that paper I'm afraid. >>>>>>>>> >>>>>>>>>> 2. If we do syncs before loads, we don't need to do them after >>>>>>>>>> stores. >>>>>>>>>> 3. If we don't do them after stores, we fail the volatile >>>>>>>>>> constructor tests. >>>>>>>>>> 4. So finally we added them again at the end of the constructor >>>>>>>>>> after stores >>>>>>>>>> to pass the volatile constructor tests. >>>>>>>>> >>>>>>>>> So we can at least undo #4 now we have established those tests >>>>>>>>> were not >>>>>>>>> required to pass. >>>>>>>>> >>>>>>>>>> We originally passed the constructor tests because the ppc memory >>>>>>>>>> order >>>>>>>>>> instructions are not as find-granular as the >>>>>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad. >>>>>>>>>> The only instruction >>>>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, >>>>>>>>>> therefore the >>>>>>>>>> MemBarVolatile after the store fixes the constructor tests. The >>>>>>>>>> proper representation >>>>>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now >>>>>>>>>> it's pointless >>>>>>>>>> anyways. >>>>>>>>>> >>>>>>>>>>> I'm not happy with the ifdef approach but I won't block it. >>>>>>>>>> I'd be happy to add a property >>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>>> >>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>>> think >>>>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>>>>>>>> based not architecture based) as that will allows for turning this >>>>>>>>> on/off for any architecture for testing purposes. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> >>>>>>>>>> or the like to guard the customization. I'd like that much better. >>>>>>>>>> Or also >>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Goetz. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>> Sent: Donnerstag, 28. November 2013 00:34 >>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>> >>>>>>>>>> TL;DR version: >>>>>>>>>> >>>>>>>>>> Discussion on the c-i list has now confirmed that a >>>>>>>>>> constructor-barrier >>>>>>>>>> for volatiles is not required as part of the JMM specification. It >>>>>>>>>> *may* >>>>>>>>>> be required in an implementation that doesn't pre-zero memory to >>>>>>>>>> ensure >>>>>>>>>> you can't see uninitialized fields. So the tests for this are >>>>>>>>>> invalid >>>>>>>>>> and this part of the patch is not needed in general (ppc64 may >>>>>>>>>> need it >>>>>>>>>> due to other factors). >>>>>>>>>> >>>>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the >>>>>>>>>> term :) >>>>>>>>>> Second thanks for the reference to that paper! For reference: >>>>>>>>>> >>>>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>>>>>>>> complex interconnect) does not guarantee that a write becomes >>>>>>>>>> visible to >>>>>>>>>> all other hardware threads at the same time point; these >>>>>>>>>> architectures >>>>>>>>>> are not multiple-copy atomic." >>>>>>>>>> >>>>>>>>>> This is the visibility issue that I referred to and affects both >>>>>>>>>> ARM and >>>>>>>>>> PPC. But of course it is normally handled by using suitable barriers >>>>>>>>>> after the stores that need to be visible. I think the crux of the >>>>>>>>>> current issue is what you wrote below: >>>>>>>>>> >>>>>>>>>> > The fixes for the constructor issue are only needed because we >>>>>>>>>> > remove the sync instruction from behind stores >>>>>>>>>> (parse3.cpp:320) >>>>>>>>>> > and place it before loads. >>>>>>>>>> >>>>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync >>>>>>>>>> after >>>>>>>>>> the store then you have to do something around the loads to get the >>>>>>>>>> same >>>>>>>>>> results! I still don't know what lead you to the conclusion that the >>>>>>>>>> only way to fix the IRIW issue was to put the fence before the >>>>>>>>>> load - >>>>>>>>>> maybe when I get the chance to read that paper in full it will be >>>>>>>>>> clearer. >>>>>>>>>> >>>>>>>>>> So ... the basic problem is that the current structure in the VM has >>>>>>>>>> hard-wired one choice of how to get the right semantics for volatile >>>>>>>>>> variables. You now want to customize that but not all the requisite >>>>>>>>>> hooks are present. It would be better if volatile_load and >>>>>>>>>> volatile_store were factored out so that they could be >>>>>>>>>> implemented as >>>>>>>>>> desired per-platform. Alternatively there could be pre- and post- >>>>>>>>>> hooks >>>>>>>>>> that could then be customized per platform. Otherwise you need >>>>>>>>>> platform-specific ifdef's to handle it as per your patch. >>>>>>>>>> >>>>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think >>>>>>>>>> this >>>>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier >>>>>>>>>> abstractions are a confused mess in my opinion. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>> >>>>>>>>>>> I did not yet add the >>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>> or >>>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>>>>> to reduce #defined, as I got no further comment on that. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>>>>>>>> I feel not in the position to contribute substantially. >>>>>>>>>>> >>>>>>>>>>> But we would like to pass the torture test suite as we consider >>>>>>>>>>> this a substantial task in implementing a PPC port. Also we think >>>>>>>>>>> both tests show behavior a programmer would expect. It's bad if >>>>>>>>>>> Java code runs fine on the more common x86 platform, and then >>>>>>>>>>> fails on ppc. This will always first be blamed on the VM. >>>>>>>>>>> >>>>>>>>>>> The fixes for the constructor issue are only needed because we >>>>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>>>>>>>> and place it before loads. Then there is no sync between volatile >>>>>>>>>>> store >>>>>>>>>>> and publishing the object. So we add it again in this one case >>>>>>>>>>> (volatile store in constructor). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> @David >>>>>>>>>>>>> Sure. There also is no solution as you require for the >>>>>>>>>>>>> taskqueue problem yet, >>>>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>>>> continuous. >>>>>>>>>>> That's not true, we did a lot of investigation and testing on this >>>>>>>>>>> issue. >>>>>>>>>>> And we came up with a solution we consider the best possible. If >>>>>>>>>>> you >>>>>>>>>>> have objections, you should at least give the draft of a better >>>>>>>>>>> solution, >>>>>>>>>>> we would volunteer to implement and test it. >>>>>>>>>>> Similarly, we invested time in fixing the concurrency torture >>>>>>>>>>> issues. >>>>>>>>>>> >>>>>>>>>>> @David >>>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term >>>>>>>>>>>> and >>>>>>>>>>>> can't find any reference to it. >>>>>>>>>>> We learned about this reading "A Tutorial Introduction to the >>>>>>>>>>> ARM and >>>>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient >>>>>>>>>>> Work-Stealing for >>>>>>>>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the >>>>>>>>>>> taskqueue problem. >>>>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>>>>>>> >>>>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I >>>>>>>>>>> used 'read' >>>>>>>>>>> instead. Sorry for that. (I also fixed that in the method name >>>>>>>>>>> above). >>>>>>>>>>> >>>>>>>>>>> Best regards and thanks for all your involvements, >>>>>>>>>>> Goetz. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>> >>>>>>>>>>> Hi Goetz, >>>>>>>>>>> >>>>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>>> Hi David, >>>>>>>>>>>> >>>>>>>>>>>> -- Volatile in constuctor >>>>>>>>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>>>>>>>> missing constructor barrier. >>>>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32 >>>>>>>>>>>> processors >>>>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks >>>>>>>>>>>> Aleksey!) >>>>>>>>>>> >>>>>>>>>>> And see follow ups - the tests are invalid. >>>>>>>>>>> >>>>>>>>>>>> -- IRIW issue >>>>>>>>>>>>> I can not possibly answer to the necessary level of detail with >>>>>>>>>>>>> a few >>>>>>>>>>>>> moments thought. >>>>>>>>>>>> Sure. There also is no solution as you require for the taskqueue >>>>>>>>>>>> problem yet, >>>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>>> >>>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>>> continuous. >>>>>>>>>>> >>>>>>>>>>>>> You are implying there is a problem here that will >>>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>>>> different?) >>>>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore >>>>>>>>>>>> I contributed a >>>>>>>>>>>> solution with the #defines, and that's correct for all, but not >>>>>>>>>>>> nice, I admit. >>>>>>>>>>>> (I don't really know about ARM, though). >>>>>>>>>>>> So if I can write down a nicer solution testing for methods that >>>>>>>>>>>> are evaluated >>>>>>>>>>>> by the C-compiler I'm happy. >>>>>>>>>>>> >>>>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the >>>>>>>>>>>> problem >>>>>>>>>>>> is that >>>>>>>>>>>> store >>>>>>>>>>>> sync >>>>>>>>>>>> does not assure multiple-read-atomicity, >>>>>>>>>>>> only >>>>>>>>>>>> sync >>>>>>>>>>>> load >>>>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>>>>>>>> pass that test. >>>>>>>>>>> >>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the >>>>>>>>>>> term and >>>>>>>>>>> can't find any reference to it. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>> The JMM is fine. And >>>>>>>>>>>> store >>>>>>>>>>>> MemBarVolatile >>>>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions >>>>>>>>>>>> that >>>>>>>>>>>> do what is required. >>>>>>>>>>>> >>>>>>>>>>>> So if you are off soon, please let's come to a solution that >>>>>>>>>>>> might be improvable in the way it's implemented, but that >>>>>>>>>>>> allows us to implement a correct PPC64 port. >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> Goetz. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; >>>>>>>>>>>> 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>>> >>>>>>>>>>>> Hi Goetz, >>>>>>>>>>>> >>>>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>> Hi everybody, >>>>>>>>>>>>> >>>>>>>>>>>>> thanks a lot for the detailed reviews! >>>>>>>>>>>>> I'll try to answer to all in one mail. >>>>>>>>>>>>> >>>>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM >>>>>>>>>>>>>> to occur before the reference is assigned; >>>>>>>>>>>>> We don't think it's correct if we omit the barrier after >>>>>>>>>>>>> initializing >>>>>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey >>>>>>>>>>>>> Shipilev >>>>>>>>>>>>> and Doug Lea, and they agreed. >>>>>>>>>>>>> Also, concurrency torture tests >>>>>>>>>>>>> LongVolatileTest >>>>>>>>>>>>> AtomicIntegerInitialValueTest >>>>>>>>>>>>> will fail. >>>>>>>>>>>>> (In addition, observing 0 instead of the inital value of a >>>>>>>>>>>>> volatile field would be >>>>>>>>>>>>> very counter-intuitive for Java programmers, especially in >>>>>>>>>>>>> AtomicInteger.) >>>>>>>>>>>> >>>>>>>>>>>> The affects of unsafe publication are always surprising - >>>>>>>>>>>> volatiles do >>>>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM >>>>>>>>>>>> that >>>>>>>>>>>> requires the constructor barrier - discussions with Doug and >>>>>>>>>>>> Aleksey >>>>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>>>>>>>> missing constructor barrier. >>>>>>>>>>>> >>>>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely >>>>>>>>>>>>>> heavyweight >>>>>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We >>>>>>>>>>>>> don't >>>>>>>>>>>>> see a way to implement this cheaper. >>>>>>>>>>>>> >>>>>>>>>>>>>> - these algorithms should be expressed using the correct >>>>>>>>>>>>>> OrderAccess operations >>>>>>>>>>>>> Basically, I agree on this. But you also have to take into >>>>>>>>>>>>> account >>>>>>>>>>>>> that due to the different memory ordering instructions on >>>>>>>>>>>>> different platforms >>>>>>>>>>>>> just implementing something empty is not sufficient. >>>>>>>>>>>>> An example: >>>>>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>>>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>>>>>>>> If these are consecutively in the code, sparc code looks like >>>>>>>>>>>>> this: >>>>>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore | >>>>>>>>>>>>> Assembler::StoreStore) >>>>>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>>>>>>>> Just doing what is required. >>>>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>>>>>>>> fine grained operations: >>>>>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore, >>>>>>>>>>>>> StoreStore, LoadLoad >>>>>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore, >>>>>>>>>>>>> StoreStore, LoadLoad, StoreLoad >>>>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations >>>>>>>>>>>>> are more (too) powerful, >>>>>>>>>>>>> I need an additional optimization that removes the lwsync. I >>>>>>>>>>>>> can not implement >>>>>>>>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>>>>>>>> >>>>>>>>>>>>> Back to the IRIW problem. I think here we have a comparable >>>>>>>>>>>>> issue. >>>>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the >>>>>>>>>>>>> read >>>>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>>>>>>>> >>>>>>>>>>>>> I would propose to guard the code by >>>>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>>>>>>>> Else, David, how would you propose to implement this platform >>>>>>>>>>>>> independent? >>>>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>>>>>>>> >>>>>>>>>>>> I can not possibly answer to the necessary level of detail with a >>>>>>>>>>>> few >>>>>>>>>>>> moments thought. You are implying there is a problem here that >>>>>>>>>>>> will >>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>>> different?) and I can not take that on face value at the >>>>>>>>>>>> moment. The >>>>>>>>>>>> only reason I can see IRIW not being handled by the JMM >>>>>>>>>>>> requirements for >>>>>>>>>>>> volatile accesses is if there are global visibility issues that >>>>>>>>>>>> are not >>>>>>>>>>>> addressed - but even then I would expect heavy barriers at the >>>>>>>>>>>> store >>>>>>>>>>>> would deal with that, not at the load. (This situation reminds me >>>>>>>>>>>> of the >>>>>>>>>>>> need for read-barriers on Alpha architecture due to the use of >>>>>>>>>>>> software >>>>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we >>>>>>>>>>>> don't have >>>>>>>>>>>> that on ppc!) >>>>>>>>>>>> >>>>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days >>>>>>>>>>>> I will >>>>>>>>>>>> be heading out on vacation for two weeks. >>>>>>>>>>>> >>>>>>>>>>>> David >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>>> Best regards, >>>>>>>>>>>>> Goetz. >>>>>>>>>>>>> >>>>>>>>>>>>> -- Other ports: >>>>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so >>>>>>>>>>>>> it might >>>>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of >>>>>>>>>>>>> them). >>>>>>>>>>>>> >>>>>>>>>>>>> -- MemBarStoreStore after initialization >>>>>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I >>>>>>>>>>>>> can >>>>>>>>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>>>>>>>> To: Vladimir Kozlov >>>>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>>>> >>>>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable >>>>>>>>>>>>> way :) >>>>>>>>>>>>> >>>>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>>>>>>>> I have to ask David to do correctness evaluation. >>>>>>>>>>>>> >>>>>>>>>>>>> From what I understand what we see here is an attempt to >>>>>>>>>>>>> fix an >>>>>>>>>>>>> existing issue with the implementation of volatiles so that the >>>>>>>>>>>>> IRIW >>>>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when >>>>>>>>>>>>> doing the >>>>>>>>>>>>> load. >>>>>>>>>>>>> >>>>>>>>>>>>> Now if this was purely handled in ppc64 source code then I >>>>>>>>>>>>> would be >>>>>>>>>>>>> happy to let them do whatever they like (surely this kills >>>>>>>>>>>>> performance >>>>>>>>>>>>> though!). But I do not agree with the changes to the shared code >>>>>>>>>>>>> that >>>>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY >>>>>>>>>>>>> this is >>>>>>>>>>>>> polluting the shared code. My concern is similar to what I said >>>>>>>>>>>>> with the >>>>>>>>>>>>> taskQueue changes - these algorithms should be expressed using >>>>>>>>>>>>> the >>>>>>>>>>>>> correct OrderAccess operations to guarantee the desired >>>>>>>>>>>>> properties >>>>>>>>>>>>> independent of architecture. If such a "barrier" is not needed >>>>>>>>>>>>> on a >>>>>>>>>>>>> given architecture then the implementation in OrderAccess should >>>>>>>>>>>>> reduce >>>>>>>>>>>>> to a no-op. >>>>>>>>>>>>> >>>>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed >>>>>>>>>>>>> under >>>>>>>>>>>>> the JMM. >>>>>>>>>>>>> >>>>>>>>>>>>>> I am fine with suggested changes because you did not change our >>>>>>>>>>>>>> current >>>>>>>>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>>>>>>>> But may be it should be done using more general query which >>>>>>>>>>>>>> is set >>>>>>>>>>>>>> depending on platform: >>>>>>>>>>>>>> >>>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>>>>> >>>>>>>>>>>>>> or similar to what we use now: >>>>>>>>>>>>>> >>>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>>>> >>>>>>>>>>>>> Every platform has to support IRIW this is simply part of the >>>>>>>>>>>>> Java >>>>>>>>>>>>> Memory Model, there should not be any need to call this out >>>>>>>>>>>>> explicitly >>>>>>>>>>>>> like this. >>>>>>>>>>>>> >>>>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are >>>>>>>>>>>>> there >>>>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM >>>>>>>>>>>>> defines? >>>>>>>>>>>>>> From what I understand our ppc port is also affected. >>>>>>>>>>>>>> David? >>>>>>>>>>>>> >>>>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>>>>>>>> >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be >>>>>>>>>>>>>> inside else {}. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64 >>>>>>>>>>>>>> specific which >>>>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need >>>>>>>>>>>>>> PPC64_ONLY() >>>>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many >>>>>>>>>>>>>> #ifdefs. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>>>>>>>> >>>>>>>>>>>>>> if (is_vol) { >>>>>>>>>>>>>> // See comment in do_get_xxx(). >>>>>>>>>>>>>> #ifndef PPC64 >>>>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>>>>>>>> #else >>>>>>>>>>>>>> if (is_field) { >>>>>>>>>>>>>> // Add MemBarRelease for constructors which write >>>>>>>>>>>>>> volatile field >>>>>>>>>>>>>> (PPC64). >>>>>>>>>>>>>> set_wrote_volatile(true); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> #endif >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Vladimir >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the >>>>>>>>>>>>>>> VolatileIRIWTest of >>>>>>>>>>>>>>> the torture test suite: >>>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Example: >>>>>>>>>>>>>>> volatile x=0, y=0 >>>>>>>>>>>>>>> __________ __________ __________ __________ >>>>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>>>>>>>> read(y) read(x) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This >>>>>>>>>>>>>>> is only >>>>>>>>>>>>>>> assured by the sync instruction and if it is executed in the >>>>>>>>>>>>>>> threads >>>>>>>>>>>>>>> doing the loads. Thus we implement volatile read as >>>>>>>>>>>>>>> sync-load-acquire >>>>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>>>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple >>>>>>>>>>>>>>> SIGSEGVs >>>>>>>>>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Further this change contains a fix that assures that volatile >>>>>>>>>>>>>>> fields >>>>>>>>>>>>>>> written in constructors are visible before the reference gets >>>>>>>>>>>>>>> published. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us, >>>>>>>>>>>>>>> seems too >>>>>>>>>>>>>>> strong. >>>>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should >>>>>>>>>>>>>>> suffice. >>>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please review and test this change. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>> Goetz. >>>>>>>>>>>>>>> From Alan.Bateman at oracle.com Tue Jan 21 12:41:03 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Tue, 21 Jan 2014 20:41:03 +0000 Subject: 7133499: (fc) FileChannel.read not preempted by asynchronous close on OS X In-Reply-To: References: <52DD10A1.3040202@oracle.com> <52DD965A.1030505@oracle.com> Message-ID: <52DEDB5F.6050301@oracle.com> On 21/01/2014 17:49, Volker Simonis wrote: > : > I also think they will hang, but I'm not sure how to test it. The > java/nio/channels/AsynchronousServerSocketChannel and > java/nio/channels/AsynchronousSocketChannel all pass, but I'm not sure > if they test the same thing. > > In the NIO are, I currently (with your change) have problems with the > following tests: > > java/nio/channels/AsyncCloseAndInterrupt.java (hangs) > java/nio/channels/AsynchronousChannelGroup/Basic.java (hangs somtimes) > java/nio/channels/AsynchronousChannelGroup/GroupOfOne.java (hangs) > java/nio/channels/AsynchronousChannelGroup/Unbounded.java (hangs somtimes) > java/nio/channels/Selector/RacyDeregister.java (fails) > > However, java/nio/channels/AsynchronousChannelGroup/Unbounded.java > hangs in AixPollPort.pollset_poll() (which is our implementation of > AsynchronousChannelGroup) so that may be a completely different > problem. I'm currently try to debug it. The SelectableChannel and AsynchronousChannel implementations are very different. In the SelectableChannel implementations then closing is complicated due to the possibility of threads being blocked in I/O operations. From the mails then it is clear that AIX hangs in dup2 but an alternative approach to initially signal the blocked threads should work there. One of the reasons for not agreeing to the calling into the NET_* function is that it results in double accounting, the selectable channels already track it. I'll send a patch soon to try and we can see about resolving this once the changes are in jdk9/dev. On testing it then AsyncCloseAndInterrupt will close and interrupt on each of the channels so it is a useful test. I don't know what to say about the AsynchronousChannelGroup tests that are hanging, I think I'd need to see the full stack trace. Closing of these channels is cooperative as there isn't any blocking so it's much simpler. So is portset_poll your implementation of Port.startPoll? > : > Yes, that's no problem. I think the class library for AIX will be fine > and ready for integration into jdk9/dev without these changes. We can > fix that later and backport it to 8u-dev as required. I think this makes sense. -Alan. From vladimir.kozlov at oracle.com Tue Jan 21 16:10:23 2014 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Wed, 22 Jan 2014 00:10:23 +0000 Subject: hg: ppc-aix-port/stage-9/hotspot: 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes Message-ID: <20140122001025.F3D4162629@hg.openjdk.java.net> Changeset: c6d7e7406136 Author: goetz Date: 2014-01-16 14:25 +0100 URL: http://hg.openjdk.java.net/ppc-aix-port/stage-9/hotspot/rev/c6d7e7406136 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes Reviewed-by: dholmes, kvn Contributed-by: martin.doerr at sap.com ! src/cpu/ppc/vm/globalDefinitions_ppc.hpp ! src/share/vm/interpreter/bytecodeInterpreter.cpp ! src/share/vm/opto/library_call.cpp ! src/share/vm/opto/parse.hpp ! src/share/vm/opto/parse1.cpp ! src/share/vm/opto/parse3.cpp ! src/share/vm/prims/unsafe.cpp ! src/share/vm/utilities/globalDefinitions.hpp From goetz.lindenmaier at sap.com Wed Jan 22 01:20:47 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 22 Jan 2014 09:20:47 +0000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <52DED1D2.1070203@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <5295DD0B.3030604@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6CE61@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> <52D78697.2090408@oracle.com> <52D79982.4060100@oracle.com> <52D79E61.1060801@oracle.com> <52D7A0A9.6070208@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8CF70@DEWDFEMB12A.global.corp.sap> <52DDFD9D.3050205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8EBA7@DEWDFEMB12A.global.corp.sap> <52DE5FB0.5000808@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8EC55@DEWDFEMB12A.global.corp.sap> <52DED1D2.1070203@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8EF85@DEWDFEMB12A.global.corp.sap> Hi Vladimir, Thanks for testing and pushing! Will you push this also to stage? I assume we can handle this as the other three hotspot changes, without a new bug-id? Also, when do you think we (you unfortunately) should update the repos again? Stage-9 maybe after Volkers last change is submitted? Best regards, Goetz -----Original Message----- From: hotspot-dev-bounces at openjdk.java.net [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov Sent: Dienstag, 21. Januar 2014 21:00 Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes Thanks. I am pushing it. Vladimir On 1/21/14 5:19 AM, Lindenmaier, Goetz wrote: > Sorry, I missed that. fixed. > > Best regards, > Goetz. > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Dienstag, 21. Januar 2014 12:53 > To: Lindenmaier, Goetz; Vladimir Kozlov > Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > Thanks Goetz! > > This typo still exists: > > + bool _wrote_volatile; // Did we write a final field? > > s/final/volatile/ > > Otherwise no further comments from me. > > David > > On 21/01/2014 7:22 PM, Lindenmaier, Goetz wrote: >> Hi, >> >> I made a new webrev >> http://cr.openjdk.java.net/~goetz/webrevs/8029101-3-raw/ >> differing from >> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ >> only in the comments. >> >> I removed >> // Support ordering of "Independent Reads of Independent Writes". >> everywhere, and edited the comments in the globalDefinition*.hpp >> files. >> >> Best regards, >> Goetz. >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Dienstag, 21. Januar 2014 05:55 >> To: Lindenmaier, Goetz; Vladimir Kozlov >> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> Hi Goetz, >> >> On 17/01/2014 6:39 PM, Lindenmaier, Goetz wrote: >>> Hi, >>> >>> I tried to come up with a webrev that implements the change as proposed in >>> your mails: >>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ >>> >>> Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use >>> support_IRIW_for_not_multiple_copy_atomic_cpu. >> >> Given the flag name the commentary eg: >> >> + // Support ordering of "Independent Reads of Independent Writes". >> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) { >> >> seems somewhat redundant. >> >>> I left the definition and handling of _wrote_volatile in the code, without >>> any protection. >> >> + bool _wrote_volatile; // Did we write a final field? >> >> s/final/volatile >> >>> I protected issuing the barrier for volatile in constructors with PPC64_ONLY() , >>> and put it on one line. >>> >>> I removed the comment in library_call.cpp. >>> I also removed the sentence " Solution: implement volatile read as sync-load-acquire." >>> from the comments as it's PPC specific. >> >> I think the primary IRIW comment/explanation should go in >> globalDefinitions.hpp where >> support_IRIW_for_not_multiple_copy_atomic_cpu is defined. >> >>> Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these >>> issues in C1 if nobody did it by then. >> >> I've filed: >> >> https://bugs.openjdk.java.net/browse/JDK-8032366 >> >> "Implement C1 support for IRIW conformance on non-multiple-copy-atomic >> platforms" >> >> to cover this task, as it may be needed sooner rather than later. >> >>> Wrt. to performance: Oracle will soon do heavy testing of the port. If any >>> performance problems arise, we still can add #ifdef PPC64 to circumvent this. >> >> Ok. >> >> Thanks, >> David >> >>> Best regards, >>> Goetz. >>> >>> >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Donnerstag, 16. Januar 2014 10:05 >>> To: Vladimir Kozlov >>> Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>> >>> On 16/01/2014 6:54 PM, Vladimir Kozlov wrote: >>>> On 1/16/14 12:34 AM, David Holmes wrote: >>>>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote: >>>>>> This is becoming ugly #ifdef mess. In compiler code we are trying to >>>>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I >>>>>> want to keep it this way, it could be useful to have such info on other >>>>>> platforms too. But I would suggest to remove PPC64 comments in >>>>>> parse.hpp. >>>>>> >>>>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value >>>>>> which could be checked in all places instead of #ifdef: >>>>> >>>>> I asked for the ifdef some time back as I find it much preferable to >>>>> have this as a build-time construct rather than a >>>>> runtime one. I don't want to have to pay anything for this if we don't >>>>> use it. >>>> >>>> Any decent C++ compiler will optimize expressions with such constants >>>> defined in header files. I insist to avoid #ifdefs in C2 code. I really >>>> don't like the code with #ifdef in unsafe.cpp but I can live with it. >>> >>> If you insist then we may as well do it all the same way. Better to be >>> consistent. >>> >>> My apologies Goetz for wasting your time going back and forth on this. >>> >>> That aside I have a further concern with this IRIW support - it is >>> incomplete as there is no C1 support, as PPC64 isn't using client. If >>> this is going on then we (which probably means the Oracle 'we') need to >>> add the missing C1 code. >>> >>> David >>> ----- >>> >>>> Vladimir >>>> >>>>> >>>>> David >>>>> >>>>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; >>>>>> #else >>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; >>>>>> #endif >>>>>> >>>>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever >>>>>> >>>>>> and then: >>>>>> >>>>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ >>>>>> oop p = JNIHandles::resolve(obj); \ >>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) >>>>>> OrderAccess::fence(); \ >>>>>> volatile type_name v = OrderAccess::load_acquire((volatile >>>>>> type_name*)index_oop_from_field_offset_long(p, offset)); >>>>>> >>>>>> And: >>>>>> >>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu && >>>>>> field->is_volatile()) { >>>>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier >>>>>> + } >>>>>> >>>>>> And so on. The comments will be needed only in globalDefinitions.hpp >>>>>> >>>>>> The code in parse1.cpp could be put on one line: >>>>>> >>>>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && >>>>>> method()->is_initializer()) )) { >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 1/15/14 9:25 PM, David Holmes wrote: >>>>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >>>>>>>> Hi David, >>>>>>>> >>>>>>>> I updated the webrev: >>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>> >>>>>>>> - I removed the IRIW example in parse3.cpp >>>>>>>> - I adapted the comments not to point to that comment, and to >>>>>>>> reflect the new flagging. Also I mention that we support the >>>>>>>> volatile constructor issue, but that it's not standard. >>>>>>>> - I protected issuing the barrier for the constructor by PPC64. >>>>>>>> I also think it's better to separate these this way. >>>>>>> >>>>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field >>>>>>> declaration and all uses to be guarded by ifdef PPC64 too >>>>>>> please. >>>>>>> >>>>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this >>>>>>> comment doesn't make much sense to me and refers to >>>>>>> ppc specific stuff in a shared file: >>>>>>> >>>>>>> if (is_volatile) { >>>>>>> ! if (!is_store) { >>>>>>> insert_mem_bar(Op_MemBarAcquire); >>>>>>> ! } else { >>>>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. >>>>>>> insert_mem_bar(Op_MemBarVolatile); >>>>>>> + #endif >>>>>>> + } >>>>>>> >>>>>>> I don't think the comment is needed. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>>> Thanks for your comments! >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Goetz. >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>> Sent: Mittwoch, 15. Januar 2014 01:55 >>>>>>>> To: Lindenmaier, Goetz >>>>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; >>>>>>>> 'hotspot-dev at openjdk.java.net' >>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>> Independent Reads of Independent Writes >>>>>>>> >>>>>>>> Hi Goetz, >>>>>>>> >>>>>>>> Sorry for the delay in getting back to this. >>>>>>>> >>>>>>>> The general changes to the volatile barriers to support IRIW are okay. >>>>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >>>>>>>> specifically it is >>>>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >>>>>>>> the commentary excessive, particularly for shared code. In particular >>>>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the >>>>>>>> explanation and I don't think we need it to that level of detail. >>>>>>>> Seems >>>>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >>>>>>>> >>>>>>>> The changes related to volatile writes in the constructor, as >>>>>>>> discussed >>>>>>>> are not required by the Java Memory Model. If you want to keep these >>>>>>>> then I think they should all be guarded with PPC64 because it is not >>>>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >>>>>>>> PPC64 porters. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I updated this webrev. I detected a small flaw I made when editing >>>>>>>>> this version. >>>>>>>>> The #endif in line 322, parse3.cpp was in the wrong line. >>>>>>>>> I also based the webrev on the latest version of the stage repo. >>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Goetz. >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Lindenmaier, Goetz >>>>>>>>> Sent: Freitag, 20. Dezember 2013 13:47 >>>>>>>>> To: David Holmes >>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>>> So we can at least undo #4 now we have established those tests were >>>>>>>>>> not >>>>>>>>>> required to pass. >>>>>>>>> We would prefer if we could keep this in. We want to avoid that it's >>>>>>>>> blamed on the VM if java programs are failing on PPC after they >>>>>>>>> worked >>>>>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard >>>>>>>>> it by >>>>>>>>> a flag as proposed. But if you insist I will remove it. Also, this >>>>>>>>> part is >>>>>>>>> not that performance relevant. >>>>>>>>> >>>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>>>> think >>>>>>>>> I added a compile-time guard in this new webrev: >>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>>>>>>>> several double negations I don't like, (#ifNdef >>>>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC) >>>>>>>>> but this way I only have to change the ppc platform. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Goetz >>>>>>>>> >>>>>>>>> P.S.: I will also be available over the Christmas period. >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>> Sent: Freitag, 20. Dezember 2013 05:58 >>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> Sorry for the delay, it takes a while to catch up after two weeks >>>>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check >>>>>>>>> emails. >>>>>>>>> >>>>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> ok, I understand the tests are wrong. It's good this issue is >>>>>>>>>> settled. >>>>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>>>>>>>> >>>>>>>>>> About our change: David, the causality is the other way round. >>>>>>>>>> The change is about IRIW. >>>>>>>>>> 1. To pass IRIW, we must use sync instructions before loads. >>>>>>>>> >>>>>>>>> This is the part I still have some question marks over as the >>>>>>>>> implications are not nice for performance on non-TSO platforms. >>>>>>>>> But I'm >>>>>>>>> no further along in processing that paper I'm afraid. >>>>>>>>> >>>>>>>>>> 2. If we do syncs before loads, we don't need to do them after >>>>>>>>>> stores. >>>>>>>>>> 3. If we don't do them after stores, we fail the volatile >>>>>>>>>> constructor tests. >>>>>>>>>> 4. So finally we added them again at the end of the constructor >>>>>>>>>> after stores >>>>>>>>>> to pass the volatile constructor tests. >>>>>>>>> >>>>>>>>> So we can at least undo #4 now we have established those tests >>>>>>>>> were not >>>>>>>>> required to pass. >>>>>>>>> >>>>>>>>>> We originally passed the constructor tests because the ppc memory >>>>>>>>>> order >>>>>>>>>> instructions are not as find-granular as the >>>>>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad. >>>>>>>>>> The only instruction >>>>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, >>>>>>>>>> therefore the >>>>>>>>>> MemBarVolatile after the store fixes the constructor tests. The >>>>>>>>>> proper representation >>>>>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now >>>>>>>>>> it's pointless >>>>>>>>>> anyways. >>>>>>>>>> >>>>>>>>>>> I'm not happy with the ifdef approach but I won't block it. >>>>>>>>>> I'd be happy to add a property >>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>>> >>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>>> think >>>>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>>>>>>>> based not architecture based) as that will allows for turning this >>>>>>>>> on/off for any architecture for testing purposes. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> >>>>>>>>>> or the like to guard the customization. I'd like that much better. >>>>>>>>>> Or also >>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Goetz. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>> Sent: Donnerstag, 28. November 2013 00:34 >>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>> >>>>>>>>>> TL;DR version: >>>>>>>>>> >>>>>>>>>> Discussion on the c-i list has now confirmed that a >>>>>>>>>> constructor-barrier >>>>>>>>>> for volatiles is not required as part of the JMM specification. It >>>>>>>>>> *may* >>>>>>>>>> be required in an implementation that doesn't pre-zero memory to >>>>>>>>>> ensure >>>>>>>>>> you can't see uninitialized fields. So the tests for this are >>>>>>>>>> invalid >>>>>>>>>> and this part of the patch is not needed in general (ppc64 may >>>>>>>>>> need it >>>>>>>>>> due to other factors). >>>>>>>>>> >>>>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the >>>>>>>>>> term :) >>>>>>>>>> Second thanks for the reference to that paper! For reference: >>>>>>>>>> >>>>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>>>>>>>> complex interconnect) does not guarantee that a write becomes >>>>>>>>>> visible to >>>>>>>>>> all other hardware threads at the same time point; these >>>>>>>>>> architectures >>>>>>>>>> are not multiple-copy atomic." >>>>>>>>>> >>>>>>>>>> This is the visibility issue that I referred to and affects both >>>>>>>>>> ARM and >>>>>>>>>> PPC. But of course it is normally handled by using suitable barriers >>>>>>>>>> after the stores that need to be visible. I think the crux of the >>>>>>>>>> current issue is what you wrote below: >>>>>>>>>> >>>>>>>>>> > The fixes for the constructor issue are only needed because we >>>>>>>>>> > remove the sync instruction from behind stores >>>>>>>>>> (parse3.cpp:320) >>>>>>>>>> > and place it before loads. >>>>>>>>>> >>>>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync >>>>>>>>>> after >>>>>>>>>> the store then you have to do something around the loads to get the >>>>>>>>>> same >>>>>>>>>> results! I still don't know what lead you to the conclusion that the >>>>>>>>>> only way to fix the IRIW issue was to put the fence before the >>>>>>>>>> load - >>>>>>>>>> maybe when I get the chance to read that paper in full it will be >>>>>>>>>> clearer. >>>>>>>>>> >>>>>>>>>> So ... the basic problem is that the current structure in the VM has >>>>>>>>>> hard-wired one choice of how to get the right semantics for volatile >>>>>>>>>> variables. You now want to customize that but not all the requisite >>>>>>>>>> hooks are present. It would be better if volatile_load and >>>>>>>>>> volatile_store were factored out so that they could be >>>>>>>>>> implemented as >>>>>>>>>> desired per-platform. Alternatively there could be pre- and post- >>>>>>>>>> hooks >>>>>>>>>> that could then be customized per platform. Otherwise you need >>>>>>>>>> platform-specific ifdef's to handle it as per your patch. >>>>>>>>>> >>>>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think >>>>>>>>>> this >>>>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier >>>>>>>>>> abstractions are a confused mess in my opinion. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>> >>>>>>>>>>> I did not yet add the >>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>> or >>>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>>>>> to reduce #defined, as I got no further comment on that. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>>>>>>>> I feel not in the position to contribute substantially. >>>>>>>>>>> >>>>>>>>>>> But we would like to pass the torture test suite as we consider >>>>>>>>>>> this a substantial task in implementing a PPC port. Also we think >>>>>>>>>>> both tests show behavior a programmer would expect. It's bad if >>>>>>>>>>> Java code runs fine on the more common x86 platform, and then >>>>>>>>>>> fails on ppc. This will always first be blamed on the VM. >>>>>>>>>>> >>>>>>>>>>> The fixes for the constructor issue are only needed because we >>>>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>>>>>>>> and place it before loads. Then there is no sync between volatile >>>>>>>>>>> store >>>>>>>>>>> and publishing the object. So we add it again in this one case >>>>>>>>>>> (volatile store in constructor). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> @David >>>>>>>>>>>>> Sure. There also is no solution as you require for the >>>>>>>>>>>>> taskqueue problem yet, >>>>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>>>> continuous. >>>>>>>>>>> That's not true, we did a lot of investigation and testing on this >>>>>>>>>>> issue. >>>>>>>>>>> And we came up with a solution we consider the best possible. If >>>>>>>>>>> you >>>>>>>>>>> have objections, you should at least give the draft of a better >>>>>>>>>>> solution, >>>>>>>>>>> we would volunteer to implement and test it. >>>>>>>>>>> Similarly, we invested time in fixing the concurrency torture >>>>>>>>>>> issues. >>>>>>>>>>> >>>>>>>>>>> @David >>>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term >>>>>>>>>>>> and >>>>>>>>>>>> can't find any reference to it. >>>>>>>>>>> We learned about this reading "A Tutorial Introduction to the >>>>>>>>>>> ARM and >>>>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient >>>>>>>>>>> Work-Stealing for >>>>>>>>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the >>>>>>>>>>> taskqueue problem. >>>>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>>>>>>> >>>>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I >>>>>>>>>>> used 'read' >>>>>>>>>>> instead. Sorry for that. (I also fixed that in the method name >>>>>>>>>>> above). >>>>>>>>>>> >>>>>>>>>>> Best regards and thanks for all your involvements, >>>>>>>>>>> Goetz. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>> >>>>>>>>>>> Hi Goetz, >>>>>>>>>>> >>>>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>>> Hi David, >>>>>>>>>>>> >>>>>>>>>>>> -- Volatile in constuctor >>>>>>>>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>>>>>>>> missing constructor barrier. >>>>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32 >>>>>>>>>>>> processors >>>>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks >>>>>>>>>>>> Aleksey!) >>>>>>>>>>> >>>>>>>>>>> And see follow ups - the tests are invalid. >>>>>>>>>>> >>>>>>>>>>>> -- IRIW issue >>>>>>>>>>>>> I can not possibly answer to the necessary level of detail with >>>>>>>>>>>>> a few >>>>>>>>>>>>> moments thought. >>>>>>>>>>>> Sure. There also is no solution as you require for the taskqueue >>>>>>>>>>>> problem yet, >>>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>>> >>>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>>> continuous. >>>>>>>>>>> >>>>>>>>>>>>> You are implying there is a problem here that will >>>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>>>> different?) >>>>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore >>>>>>>>>>>> I contributed a >>>>>>>>>>>> solution with the #defines, and that's correct for all, but not >>>>>>>>>>>> nice, I admit. >>>>>>>>>>>> (I don't really know about ARM, though). >>>>>>>>>>>> So if I can write down a nicer solution testing for methods that >>>>>>>>>>>> are evaluated >>>>>>>>>>>> by the C-compiler I'm happy. >>>>>>>>>>>> >>>>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the >>>>>>>>>>>> problem >>>>>>>>>>>> is that >>>>>>>>>>>> store >>>>>>>>>>>> sync >>>>>>>>>>>> does not assure multiple-read-atomicity, >>>>>>>>>>>> only >>>>>>>>>>>> sync >>>>>>>>>>>> load >>>>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>>>>>>>> pass that test. >>>>>>>>>>> >>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the >>>>>>>>>>> term and >>>>>>>>>>> can't find any reference to it. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>> The JMM is fine. And >>>>>>>>>>>> store >>>>>>>>>>>> MemBarVolatile >>>>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions >>>>>>>>>>>> that >>>>>>>>>>>> do what is required. >>>>>>>>>>>> >>>>>>>>>>>> So if you are off soon, please let's come to a solution that >>>>>>>>>>>> might be improvable in the way it's implemented, but that >>>>>>>>>>>> allows us to implement a correct PPC64 port. >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> Goetz. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; >>>>>>>>>>>> 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>>> >>>>>>>>>>>> Hi Goetz, >>>>>>>>>>>> >>>>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>> Hi everybody, >>>>>>>>>>>>> >>>>>>>>>>>>> thanks a lot for the detailed reviews! >>>>>>>>>>>>> I'll try to answer to all in one mail. >>>>>>>>>>>>> >>>>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM >>>>>>>>>>>>>> to occur before the reference is assigned; >>>>>>>>>>>>> We don't think it's correct if we omit the barrier after >>>>>>>>>>>>> initializing >>>>>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey >>>>>>>>>>>>> Shipilev >>>>>>>>>>>>> and Doug Lea, and they agreed. >>>>>>>>>>>>> Also, concurrency torture tests >>>>>>>>>>>>> LongVolatileTest >>>>>>>>>>>>> AtomicIntegerInitialValueTest >>>>>>>>>>>>> will fail. >>>>>>>>>>>>> (In addition, observing 0 instead of the inital value of a >>>>>>>>>>>>> volatile field would be >>>>>>>>>>>>> very counter-intuitive for Java programmers, especially in >>>>>>>>>>>>> AtomicInteger.) >>>>>>>>>>>> >>>>>>>>>>>> The affects of unsafe publication are always surprising - >>>>>>>>>>>> volatiles do >>>>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM >>>>>>>>>>>> that >>>>>>>>>>>> requires the constructor barrier - discussions with Doug and >>>>>>>>>>>> Aleksey >>>>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>>>>>>>> missing constructor barrier. >>>>>>>>>>>> >>>>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely >>>>>>>>>>>>>> heavyweight >>>>>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We >>>>>>>>>>>>> don't >>>>>>>>>>>>> see a way to implement this cheaper. >>>>>>>>>>>>> >>>>>>>>>>>>>> - these algorithms should be expressed using the correct >>>>>>>>>>>>>> OrderAccess operations >>>>>>>>>>>>> Basically, I agree on this. But you also have to take into >>>>>>>>>>>>> account >>>>>>>>>>>>> that due to the different memory ordering instructions on >>>>>>>>>>>>> different platforms >>>>>>>>>>>>> just implementing something empty is not sufficient. >>>>>>>>>>>>> An example: >>>>>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>>>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>>>>>>>> If these are consecutively in the code, sparc code looks like >>>>>>>>>>>>> this: >>>>>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore | >>>>>>>>>>>>> Assembler::StoreStore) >>>>>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>>>>>>>> Just doing what is required. >>>>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>>>>>>>> fine grained operations: >>>>>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore, >>>>>>>>>>>>> StoreStore, LoadLoad >>>>>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore, >>>>>>>>>>>>> StoreStore, LoadLoad, StoreLoad >>>>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations >>>>>>>>>>>>> are more (too) powerful, >>>>>>>>>>>>> I need an additional optimization that removes the lwsync. I >>>>>>>>>>>>> can not implement >>>>>>>>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>>>>>>>> >>>>>>>>>>>>> Back to the IRIW problem. I think here we have a comparable >>>>>>>>>>>>> issue. >>>>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the >>>>>>>>>>>>> read >>>>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>>>>>>>> >>>>>>>>>>>>> I would propose to guard the code by >>>>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>>>>>>>> Else, David, how would you propose to implement this platform >>>>>>>>>>>>> independent? >>>>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>>>>>>>> >>>>>>>>>>>> I can not possibly answer to the necessary level of detail with a >>>>>>>>>>>> few >>>>>>>>>>>> moments thought. You are implying there is a problem here that >>>>>>>>>>>> will >>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>>> different?) and I can not take that on face value at the >>>>>>>>>>>> moment. The >>>>>>>>>>>> only reason I can see IRIW not being handled by the JMM >>>>>>>>>>>> requirements for >>>>>>>>>>>> volatile accesses is if there are global visibility issues that >>>>>>>>>>>> are not >>>>>>>>>>>> addressed - but even then I would expect heavy barriers at the >>>>>>>>>>>> store >>>>>>>>>>>> would deal with that, not at the load. (This situation reminds me >>>>>>>>>>>> of the >>>>>>>>>>>> need for read-barriers on Alpha architecture due to the use of >>>>>>>>>>>> software >>>>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we >>>>>>>>>>>> don't have >>>>>>>>>>>> that on ppc!) >>>>>>>>>>>> >>>>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days >>>>>>>>>>>> I will >>>>>>>>>>>> be heading out on vacation for two weeks. >>>>>>>>>>>> >>>>>>>>>>>> David >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>>> Best regards, >>>>>>>>>>>>> Goetz. >>>>>>>>>>>>> >>>>>>>>>>>>> -- Other ports: >>>>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so >>>>>>>>>>>>> it might >>>>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of >>>>>>>>>>>>> them). >>>>>>>>>>>>> >>>>>>>>>>>>> -- MemBarStoreStore after initialization >>>>>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I >>>>>>>>>>>>> can >>>>>>>>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>>>>>>>> To: Vladimir Kozlov >>>>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>>>> >>>>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable >>>>>>>>>>>>> way :) >>>>>>>>>>>>> >>>>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>>>>>>>> I have to ask David to do correctness evaluation. >>>>>>>>>>>>> >>>>>>>>>>>>> From what I understand what we see here is an attempt to >>>>>>>>>>>>> fix an >>>>>>>>>>>>> existing issue with the implementation of volatiles so that the >>>>>>>>>>>>> IRIW >>>>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when >>>>>>>>>>>>> doing the >>>>>>>>>>>>> load. >>>>>>>>>>>>> >>>>>>>>>>>>> Now if this was purely handled in ppc64 source code then I >>>>>>>>>>>>> would be >>>>>>>>>>>>> happy to let them do whatever they like (surely this kills >>>>>>>>>>>>> performance >>>>>>>>>>>>> though!). But I do not agree with the changes to the shared code >>>>>>>>>>>>> that >>>>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY >>>>>>>>>>>>> this is >>>>>>>>>>>>> polluting the shared code. My concern is similar to what I said >>>>>>>>>>>>> with the >>>>>>>>>>>>> taskQueue changes - these algorithms should be expressed using >>>>>>>>>>>>> the >>>>>>>>>>>>> correct OrderAccess operations to guarantee the desired >>>>>>>>>>>>> properties >>>>>>>>>>>>> independent of architecture. If such a "barrier" is not needed >>>>>>>>>>>>> on a >>>>>>>>>>>>> given architecture then the implementation in OrderAccess should >>>>>>>>>>>>> reduce >>>>>>>>>>>>> to a no-op. >>>>>>>>>>>>> >>>>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed >>>>>>>>>>>>> under >>>>>>>>>>>>> the JMM. >>>>>>>>>>>>> >>>>>>>>>>>>>> I am fine with suggested changes because you did not change our >>>>>>>>>>>>>> current >>>>>>>>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>>>>>>>> But may be it should be done using more general query which >>>>>>>>>>>>>> is set >>>>>>>>>>>>>> depending on platform: >>>>>>>>>>>>>> >>>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>>>>> >>>>>>>>>>>>>> or similar to what we use now: >>>>>>>>>>>>>> >>>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>>>> >>>>>>>>>>>>> Every platform has to support IRIW this is simply part of the >>>>>>>>>>>>> Java >>>>>>>>>>>>> Memory Model, there should not be any need to call this out >>>>>>>>>>>>> explicitly >>>>>>>>>>>>> like this. >>>>>>>>>>>>> >>>>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are >>>>>>>>>>>>> there >>>>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM >>>>>>>>>>>>> defines? >>>>>>>>>>>>>> From what I understand our ppc port is also affected. >>>>>>>>>>>>>> David? >>>>>>>>>>>>> >>>>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>>>>>>>> >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be >>>>>>>>>>>>>> inside else {}. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64 >>>>>>>>>>>>>> specific which >>>>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need >>>>>>>>>>>>>> PPC64_ONLY() >>>>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many >>>>>>>>>>>>>> #ifdefs. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>>>>>>>> >>>>>>>>>>>>>> if (is_vol) { >>>>>>>>>>>>>> // See comment in do_get_xxx(). >>>>>>>>>>>>>> #ifndef PPC64 >>>>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>>>>>>>> #else >>>>>>>>>>>>>> if (is_field) { >>>>>>>>>>>>>> // Add MemBarRelease for constructors which write >>>>>>>>>>>>>> volatile field >>>>>>>>>>>>>> (PPC64). >>>>>>>>>>>>>> set_wrote_volatile(true); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> #endif >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Vladimir >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the >>>>>>>>>>>>>>> VolatileIRIWTest of >>>>>>>>>>>>>>> the torture test suite: >>>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Example: >>>>>>>>>>>>>>> volatile x=0, y=0 >>>>>>>>>>>>>>> __________ __________ __________ __________ >>>>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>>>>>>>> read(y) read(x) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This >>>>>>>>>>>>>>> is only >>>>>>>>>>>>>>> assured by the sync instruction and if it is executed in the >>>>>>>>>>>>>>> threads >>>>>>>>>>>>>>> doing the loads. Thus we implement volatile read as >>>>>>>>>>>>>>> sync-load-acquire >>>>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>>>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple >>>>>>>>>>>>>>> SIGSEGVs >>>>>>>>>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Further this change contains a fix that assures that volatile >>>>>>>>>>>>>>> fields >>>>>>>>>>>>>>> written in constructors are visible before the reference gets >>>>>>>>>>>>>>> published. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us, >>>>>>>>>>>>>>> seems too >>>>>>>>>>>>>>> strong. >>>>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should >>>>>>>>>>>>>>> suffice. >>>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please review and test this change. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>> Goetz. >>>>>>>>>>>>>>> From goetz.lindenmaier at sap.com Wed Jan 22 06:01:14 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 22 Jan 2014 14:01:14 +0000 Subject: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization In-Reply-To: <52DDE01D.1060605@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE707E1@DEWDFEMB12A.global.corp.sap> <52B3A3AF.9050609@oracle.com> <52D76E6F.8070504@oracle.com> <52D821CB.6020207@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8D010@DEWDFEMB12A.global.corp.sap> <52DC8DA9.2010106@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8D974@DEWDFEMB12A.global.corp.sap> <52DDE01D.1060605@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8F125@DEWDFEMB12A.global.corp.sap> Hi, thanks for the hint to this paper, David! We modified the known read-after-write test to use the serialization page mechanism, and reading that paper we were able to understand it is correct on X86 / TSO. Thanks to Andreas Schoesser for this. http://cr.openjdk.java.net/~goetz/raw_serialization_page/raw_s11n_page.cpp We ran the test on linux ppc and aix over the weekend, without failure. This indicates the mechanism works there, too. The test fails immediately if the "write" in the fast thread is omitted. We also checked that both threads make considerable progress, and the fast thread does not always trap. Also, we looked at documentation about how mprotect etc should be implemented on ppc. A mprotect should do a tlbie, invidating the TLBs of all processors. It also should do an eieio and a tlbsync This assures the TLB of the "fast" process doing only the write is invalidated before mprotect returns. So the fast thread will experience a TLB-miss on the write. We assume on the TLB-miss some synchronization instruction is used, probably the ptesync. This should assure the fast thread gets the new value even if the page is already writable again. Unfortunately this is implemented in supervisor code, which is not available to us. I'll add a patch that enables UseMembar in our VM to test the performance impacts of that variant. As I don't see a good reason this flag is product, I'll change it to debug for the test. As you described, the flag should be set depending on whether the serialization page works. So nowadays it serves as a platform configuration so I think debug is the better choice. (Btw, you could move UsePPCLWSYNC into your globals_ppc.hpp with the new platform-dependent flags.) Best regards, Goetz. -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Dienstag, 21. Januar 2014 03:49 To: Lindenmaier, Goetz; Vladimir Kozlov Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev Source Developers' Subject: Re: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization On 20/01/2014 11:41 PM, Lindenmaier, Goetz wrote: > Hi David, > > I understand your arguments and basically agree with them. > If the serialization page does not work on PPC, your solution > 1) is best. > > But I'm not sure why there should be a link between TSO and whether > the serialization page trick works. Second depends, as you say, > on the OS implementation. My limited understanding is that on RMO-based systems the requirements for: "Synchronization based on page-protections - mprotect()" as described in: http://home.comcast.net/~pjbishop/Dave/Asymmetric-Dekker-Synchronization.txt may not always hold. Dave Dice would need to provide more details if needed. Cheers, David > I would assume that the write to the serialization page causes > the OS to generate a new TLB entry or the like, involving the > use of a ptwsync instruction which would be fine. But we are > investigating this. > > We are also experimenting with a small regression test to find > out whether the serialization page ever fails. > > Best regards, > Goetz. > > > > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Montag, 20. Januar 2014 03:45 > To: Lindenmaier, Goetz; Vladimir Kozlov > Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev Source Developers' > Subject: Re: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization > > On 17/01/2014 11:30 PM, Lindenmaier, Goetz wrote: >> Hi, >> >> I had a look at the first part of this issue: Whether StoreStore >> is necessary in the interpreter. Let's for now assume the serialization >> page mechanism works on PPC. >> >> In the state transition leaving the VM state, which is executed in the >> destructor, ThreadStateTransition::transition() is called, which executes >> if (UseMembar) { >> OrderAccess::fence(); >> } else { >> os::write_memory_serialize_page(thread); >> } >> >> os:: write_memory_serialize_page() can not be considered a proper >> MemBar, as it only serializes if another thread poisoned the page. >> Thus it does not qualify to order the initialization and the publishing >> of the object. >> >> You are right, if UseMembar is true, the StoreStore in the interpreter >> is superfluous. We could guard the StoreStores in the interpreter by >> !UseMembar. > > My understanding, from our existing non-TSO system ports, is that the > present assumption is that either: > > a) you have a TSO system, in which case you are probably using the > serialization page, but you don't need any barrier to enforce ordering > anyway; or > > b) you don't have a TSO system, you are using UseMembar==true and so you > get a full fence inserted that enforces the ordering anyway. > > So the ordering requirements are satisfied by piggy-backing on the > UseMembar setting that comes from the thread state transition code, > which forms part of the "runtime entry" code. That's not to say that you > will necessarily find this applied consistently in all places where it > might be applied - nor will you necessarily find that this is common > knowledge amongst VM engineers. > > Technically the storeStore barriers could be conditional on !UseMembar > but that is redundant in the current usage. > >> But then again, one is to order the publishing of the thread states, >> the other to enforce some Java semantics. I don't know whether everybody >> who changes in one place is aware of both issues. But if you want to, >> I'll add a !UseMembar in the interpreter. > > Here are my preferred options in order: > > 1. Set UseMembar==true on PPC64 and drop these new storeStore barriers - > rely on the piggy-backing effect. > > 2. Conditionalize the new storeStore barriers on !UseMembar. This > unfortunately penalizes all platforms with a runtime check. > > 3. Add the storeStores unconditionally. This penalizes platforms that > set UseMembar==true as we will now get two fences at runtime. > > I know we're talking about the interpreter here so performance is not > exactly critical, but still ... > >> Maybe it would be a good idea >> to document the double use in interfaceSupport.cpp, too. And maybe >> add an assertion of some kind. > > interfaceSupport doesn't know that other code piggy-backs on the fact > state-transitions have full fences when UseMembar is true. If it is > documented anywhere it should be in the interpreter (and any other > places that makes the same assumption) - something like: > > // On non-TSO systems there can be additional ordering constraints > // between Java-level actions (such as allocation and constructor > // invocation) that in principle need explicit memory barriers. > // However, on many non-TSO systems the thread-state transition logic > // in the IRT_ENTRY code will insert a full fence due to the use of > // UseMembar==true, which provides the necessary ordering guarantees. > >> >> We're digging into the other issue currenty, whether the serialization page >> works on ppc. We understand your concerns and have no simple answer to >> it right now. At least, in our VM and in the port there are no known problems >> with the state transitions. > > Even if the memory serialization page does not work, in a guaranteed > sense, on PPC-AIX, it is extremely unlikely that testing would expose > this. Also note that the memory serialization page behaviour is more a > function of the OS - so it may be that AIX is different to linux in that > regard. > > Cheers, > David > ----- > >> Best regards, >> Goetz. >> >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Donnerstag, 16. Januar 2014 19:16 >> To: David Holmes; Lindenmaier, Goetz >> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev Source Developers' >> Subject: Re: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization >> >> Changes are in C++ Interpreter so it does not affect Oracle VM. >> But David has point here. I would like to hear the explanation too. >> >> BTW, I see that for ppc64: >> >> src/cpu/ppc/vm//globals_ppc.hpp:define_pd_global(bool, UseMembar, false); >> >> as result write_memory_serialize_page() is used in >> ThreadStateTransition::transition(). >> >> Is it not enough on PPC64? >> >> Thanks, >> Vladimir >> >> On 1/15/14 9:30 PM, David Holmes wrote: >>> Can I get some response on this please - specifically the redundancy wrt >>> IRT_ENTRY actions. >>> >>> Thanks, >>> David >>> >>> On 20/12/2013 11:55 AM, David Holmes wrote: >>>> Still catching up ... >>>> >>>> On 11/12/2013 9:46 PM, Lindenmaier, Goetz wrote: >>>>> Hi, >>>>> >>>>> this change adds StoreStore barriers after object initialization and >>>>> after constructor calls in the C++ interpreter. This assures no >>>>> uninitialized >>>>> objects or final fields are visible. >>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029957-0-moci/ >>>> >>>> The InterpreterRuntime calls are all IRT_ENTRY points which will utilize >>>> thread state transitions that already include a full "fence" so the >>>> storestore barriers are redundant in those cases. >>>> >>>> The fastpath _new storestore seems okay. >>>> >>>> I don't know how handle_return gets used to know if it is reasonable or >>>> not. >>>> >>>> I was trying, unsuccessfully, to examine the same code in the >>>> templateInterpreter to see how it handles these cases as it naturally >>>> has the same object-initialization-safety requirements (though these can >>>> be handled in a number of different ways other than an unconditional >>>> storestore barrier at the end of the initialization and construction >>>> phases. >>>> >>>> David >>>> ----- >>>> >>>>> Please review and test this change. >>>>> >>>>> Best regards, >>>>> Goetz. >>>>> From vladimir.kozlov at oracle.com Wed Jan 22 09:02:04 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 22 Jan 2014 09:02:04 -0800 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CE8EF85@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> <52D78697.2090408@oracle.com> <52D79982.4060100@oracle.com> <52D79E61.1060801@oracle.com> <52D7A0A9.6070208@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8CF70@DEWDFEMB12A.global.corp.sap> <52DDFD9D.3050205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8EBA7@DEWDFEMB12A.global.corp.sap> <52DE5FB0.5000808@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8EC55@DEWDFEMB12A.global.corp.sap> <52DED1D2.1070203@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8EF85@DEWDFEMB12A.global.corp.sap> Message-ID: <52DFF98C.8010001@oracle.com> Hi Goetz On 1/22/14 1:20 AM, Lindenmaier, Goetz wrote: > Hi Vladimir, > > Thanks for testing and pushing! > > Will you push this also to stage? I assume we can handle this > as the other three hotspot changes, without a new bug-id? Yes, I will backport it. What about JDK changes Volker pushed: 8028537, 8031134, 8031997 and new one from today 8031581? Should I backport all of them into 8u stage? From conversion between Volker and Alan some of them need backport a fix from jdk9. Or I am mistaking? > > Also, when do you think we (you unfortunately) should update > the repos again? Stage-9 maybe after Volkers last change is submitted? After I test and push 8031581 I will do sync with latest jdk9 sources (b01). I will build bundles and give them to SQE for final testing. Thanks, Vladimir > > Best regards, > Goetz > > > > -----Original Message----- > From: hotspot-dev-bounces at openjdk.java.net [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov > Sent: Dienstag, 21. Januar 2014 21:00 > Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > Thanks. I am pushing it. > > Vladimir > > On 1/21/14 5:19 AM, Lindenmaier, Goetz wrote: >> Sorry, I missed that. fixed. >> >> Best regards, >> Goetz. >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Dienstag, 21. Januar 2014 12:53 >> To: Lindenmaier, Goetz; Vladimir Kozlov >> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> Thanks Goetz! >> >> This typo still exists: >> >> + bool _wrote_volatile; // Did we write a final field? >> >> s/final/volatile/ >> >> Otherwise no further comments from me. >> >> David >> >> On 21/01/2014 7:22 PM, Lindenmaier, Goetz wrote: >>> Hi, >>> >>> I made a new webrev >>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-3-raw/ >>> differing from >>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ >>> only in the comments. >>> >>> I removed >>> // Support ordering of "Independent Reads of Independent Writes". >>> everywhere, and edited the comments in the globalDefinition*.hpp >>> files. >>> >>> Best regards, >>> Goetz. >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Dienstag, 21. Januar 2014 05:55 >>> To: Lindenmaier, Goetz; Vladimir Kozlov >>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>> >>> Hi Goetz, >>> >>> On 17/01/2014 6:39 PM, Lindenmaier, Goetz wrote: >>>> Hi, >>>> >>>> I tried to come up with a webrev that implements the change as proposed in >>>> your mails: >>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ >>>> >>>> Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use >>>> support_IRIW_for_not_multiple_copy_atomic_cpu. >>> >>> Given the flag name the commentary eg: >>> >>> + // Support ordering of "Independent Reads of Independent Writes". >>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) { >>> >>> seems somewhat redundant. >>> >>>> I left the definition and handling of _wrote_volatile in the code, without >>>> any protection. >>> >>> + bool _wrote_volatile; // Did we write a final field? >>> >>> s/final/volatile >>> >>>> I protected issuing the barrier for volatile in constructors with PPC64_ONLY() , >>>> and put it on one line. >>>> >>>> I removed the comment in library_call.cpp. >>>> I also removed the sentence " Solution: implement volatile read as sync-load-acquire." >>>> from the comments as it's PPC specific. >>> >>> I think the primary IRIW comment/explanation should go in >>> globalDefinitions.hpp where >>> support_IRIW_for_not_multiple_copy_atomic_cpu is defined. >>> >>>> Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these >>>> issues in C1 if nobody did it by then. >>> >>> I've filed: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8032366 >>> >>> "Implement C1 support for IRIW conformance on non-multiple-copy-atomic >>> platforms" >>> >>> to cover this task, as it may be needed sooner rather than later. >>> >>>> Wrt. to performance: Oracle will soon do heavy testing of the port. If any >>>> performance problems arise, we still can add #ifdef PPC64 to circumvent this. >>> >>> Ok. >>> >>> Thanks, >>> David >>> >>>> Best regards, >>>> Goetz. >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Donnerstag, 16. Januar 2014 10:05 >>>> To: Vladimir Kozlov >>>> Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>> >>>> On 16/01/2014 6:54 PM, Vladimir Kozlov wrote: >>>>> On 1/16/14 12:34 AM, David Holmes wrote: >>>>>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote: >>>>>>> This is becoming ugly #ifdef mess. In compiler code we are trying to >>>>>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I >>>>>>> want to keep it this way, it could be useful to have such info on other >>>>>>> platforms too. But I would suggest to remove PPC64 comments in >>>>>>> parse.hpp. >>>>>>> >>>>>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value >>>>>>> which could be checked in all places instead of #ifdef: >>>>>> >>>>>> I asked for the ifdef some time back as I find it much preferable to >>>>>> have this as a build-time construct rather than a >>>>>> runtime one. I don't want to have to pay anything for this if we don't >>>>>> use it. >>>>> >>>>> Any decent C++ compiler will optimize expressions with such constants >>>>> defined in header files. I insist to avoid #ifdefs in C2 code. I really >>>>> don't like the code with #ifdef in unsafe.cpp but I can live with it. >>>> >>>> If you insist then we may as well do it all the same way. Better to be >>>> consistent. >>>> >>>> My apologies Goetz for wasting your time going back and forth on this. >>>> >>>> That aside I have a further concern with this IRIW support - it is >>>> incomplete as there is no C1 support, as PPC64 isn't using client. If >>>> this is going on then we (which probably means the Oracle 'we') need to >>>> add the missing C1 code. >>>> >>>> David >>>> ----- >>>> >>>>> Vladimir >>>>> >>>>>> >>>>>> David >>>>>> >>>>>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; >>>>>>> #else >>>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; >>>>>>> #endif >>>>>>> >>>>>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever >>>>>>> >>>>>>> and then: >>>>>>> >>>>>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ >>>>>>> oop p = JNIHandles::resolve(obj); \ >>>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) >>>>>>> OrderAccess::fence(); \ >>>>>>> volatile type_name v = OrderAccess::load_acquire((volatile >>>>>>> type_name*)index_oop_from_field_offset_long(p, offset)); >>>>>>> >>>>>>> And: >>>>>>> >>>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu && >>>>>>> field->is_volatile()) { >>>>>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier >>>>>>> + } >>>>>>> >>>>>>> And so on. The comments will be needed only in globalDefinitions.hpp >>>>>>> >>>>>>> The code in parse1.cpp could be put on one line: >>>>>>> >>>>>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && >>>>>>> method()->is_initializer()) )) { >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 1/15/14 9:25 PM, David Holmes wrote: >>>>>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>> I updated the webrev: >>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>>> >>>>>>>>> - I removed the IRIW example in parse3.cpp >>>>>>>>> - I adapted the comments not to point to that comment, and to >>>>>>>>> reflect the new flagging. Also I mention that we support the >>>>>>>>> volatile constructor issue, but that it's not standard. >>>>>>>>> - I protected issuing the barrier for the constructor by PPC64. >>>>>>>>> I also think it's better to separate these this way. >>>>>>>> >>>>>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field >>>>>>>> declaration and all uses to be guarded by ifdef PPC64 too >>>>>>>> please. >>>>>>>> >>>>>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this >>>>>>>> comment doesn't make much sense to me and refers to >>>>>>>> ppc specific stuff in a shared file: >>>>>>>> >>>>>>>> if (is_volatile) { >>>>>>>> ! if (!is_store) { >>>>>>>> insert_mem_bar(Op_MemBarAcquire); >>>>>>>> ! } else { >>>>>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>>>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. >>>>>>>> insert_mem_bar(Op_MemBarVolatile); >>>>>>>> + #endif >>>>>>>> + } >>>>>>>> >>>>>>>> I don't think the comment is needed. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>>>> Thanks for your comments! >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Goetz. >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>> Sent: Mittwoch, 15. Januar 2014 01:55 >>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; >>>>>>>>> 'hotspot-dev at openjdk.java.net' >>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> Hi Goetz, >>>>>>>>> >>>>>>>>> Sorry for the delay in getting back to this. >>>>>>>>> >>>>>>>>> The general changes to the volatile barriers to support IRIW are okay. >>>>>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >>>>>>>>> specifically it is >>>>>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >>>>>>>>> the commentary excessive, particularly for shared code. In particular >>>>>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the >>>>>>>>> explanation and I don't think we need it to that level of detail. >>>>>>>>> Seems >>>>>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >>>>>>>>> >>>>>>>>> The changes related to volatile writes in the constructor, as >>>>>>>>> discussed >>>>>>>>> are not required by the Java Memory Model. If you want to keep these >>>>>>>>> then I think they should all be guarded with PPC64 because it is not >>>>>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >>>>>>>>> PPC64 porters. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> >>>>>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I updated this webrev. I detected a small flaw I made when editing >>>>>>>>>> this version. >>>>>>>>>> The #endif in line 322, parse3.cpp was in the wrong line. >>>>>>>>>> I also based the webrev on the latest version of the stage repo. >>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Goetz. >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Lindenmaier, Goetz >>>>>>>>>> Sent: Freitag, 20. Dezember 2013 13:47 >>>>>>>>>> To: David Holmes >>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>> >>>>>>>>>> Hi David, >>>>>>>>>> >>>>>>>>>>> So we can at least undo #4 now we have established those tests were >>>>>>>>>>> not >>>>>>>>>>> required to pass. >>>>>>>>>> We would prefer if we could keep this in. We want to avoid that it's >>>>>>>>>> blamed on the VM if java programs are failing on PPC after they >>>>>>>>>> worked >>>>>>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard >>>>>>>>>> it by >>>>>>>>>> a flag as proposed. But if you insist I will remove it. Also, this >>>>>>>>>> part is >>>>>>>>>> not that performance relevant. >>>>>>>>>> >>>>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>>>>> think >>>>>>>>>> I added a compile-time guard in this new webrev: >>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>>>>>>>>> several double negations I don't like, (#ifNdef >>>>>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC) >>>>>>>>>> but this way I only have to change the ppc platform. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Goetz >>>>>>>>>> >>>>>>>>>> P.S.: I will also be available over the Christmas period. >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>> Sent: Freitag, 20. Dezember 2013 05:58 >>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>> >>>>>>>>>> Sorry for the delay, it takes a while to catch up after two weeks >>>>>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check >>>>>>>>>> emails. >>>>>>>>>> >>>>>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> ok, I understand the tests are wrong. It's good this issue is >>>>>>>>>>> settled. >>>>>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>>>>>>>>> >>>>>>>>>>> About our change: David, the causality is the other way round. >>>>>>>>>>> The change is about IRIW. >>>>>>>>>>> 1. To pass IRIW, we must use sync instructions before loads. >>>>>>>>>> >>>>>>>>>> This is the part I still have some question marks over as the >>>>>>>>>> implications are not nice for performance on non-TSO platforms. >>>>>>>>>> But I'm >>>>>>>>>> no further along in processing that paper I'm afraid. >>>>>>>>>> >>>>>>>>>>> 2. If we do syncs before loads, we don't need to do them after >>>>>>>>>>> stores. >>>>>>>>>>> 3. If we don't do them after stores, we fail the volatile >>>>>>>>>>> constructor tests. >>>>>>>>>>> 4. So finally we added them again at the end of the constructor >>>>>>>>>>> after stores >>>>>>>>>>> to pass the volatile constructor tests. >>>>>>>>>> >>>>>>>>>> So we can at least undo #4 now we have established those tests >>>>>>>>>> were not >>>>>>>>>> required to pass. >>>>>>>>>> >>>>>>>>>>> We originally passed the constructor tests because the ppc memory >>>>>>>>>>> order >>>>>>>>>>> instructions are not as find-granular as the >>>>>>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad. >>>>>>>>>>> The only instruction >>>>>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, >>>>>>>>>>> therefore the >>>>>>>>>>> MemBarVolatile after the store fixes the constructor tests. The >>>>>>>>>>> proper representation >>>>>>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now >>>>>>>>>>> it's pointless >>>>>>>>>>> anyways. >>>>>>>>>>> >>>>>>>>>>>> I'm not happy with the ifdef approach but I won't block it. >>>>>>>>>>> I'd be happy to add a property >>>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>>>> >>>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>>>> think >>>>>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>>>>>>>>> based not architecture based) as that will allows for turning this >>>>>>>>>> on/off for any architecture for testing purposes. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>>> or the like to guard the customization. I'd like that much better. >>>>>>>>>>> Or also >>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Goetz. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>> Sent: Donnerstag, 28. November 2013 00:34 >>>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>> >>>>>>>>>>> TL;DR version: >>>>>>>>>>> >>>>>>>>>>> Discussion on the c-i list has now confirmed that a >>>>>>>>>>> constructor-barrier >>>>>>>>>>> for volatiles is not required as part of the JMM specification. It >>>>>>>>>>> *may* >>>>>>>>>>> be required in an implementation that doesn't pre-zero memory to >>>>>>>>>>> ensure >>>>>>>>>>> you can't see uninitialized fields. So the tests for this are >>>>>>>>>>> invalid >>>>>>>>>>> and this part of the patch is not needed in general (ppc64 may >>>>>>>>>>> need it >>>>>>>>>>> due to other factors). >>>>>>>>>>> >>>>>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the >>>>>>>>>>> term :) >>>>>>>>>>> Second thanks for the reference to that paper! For reference: >>>>>>>>>>> >>>>>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>>>>>>>>> complex interconnect) does not guarantee that a write becomes >>>>>>>>>>> visible to >>>>>>>>>>> all other hardware threads at the same time point; these >>>>>>>>>>> architectures >>>>>>>>>>> are not multiple-copy atomic." >>>>>>>>>>> >>>>>>>>>>> This is the visibility issue that I referred to and affects both >>>>>>>>>>> ARM and >>>>>>>>>>> PPC. But of course it is normally handled by using suitable barriers >>>>>>>>>>> after the stores that need to be visible. I think the crux of the >>>>>>>>>>> current issue is what you wrote below: >>>>>>>>>>> >>>>>>>>>>> > The fixes for the constructor issue are only needed because we >>>>>>>>>>> > remove the sync instruction from behind stores >>>>>>>>>>> (parse3.cpp:320) >>>>>>>>>>> > and place it before loads. >>>>>>>>>>> >>>>>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync >>>>>>>>>>> after >>>>>>>>>>> the store then you have to do something around the loads to get the >>>>>>>>>>> same >>>>>>>>>>> results! I still don't know what lead you to the conclusion that the >>>>>>>>>>> only way to fix the IRIW issue was to put the fence before the >>>>>>>>>>> load - >>>>>>>>>>> maybe when I get the chance to read that paper in full it will be >>>>>>>>>>> clearer. >>>>>>>>>>> >>>>>>>>>>> So ... the basic problem is that the current structure in the VM has >>>>>>>>>>> hard-wired one choice of how to get the right semantics for volatile >>>>>>>>>>> variables. You now want to customize that but not all the requisite >>>>>>>>>>> hooks are present. It would be better if volatile_load and >>>>>>>>>>> volatile_store were factored out so that they could be >>>>>>>>>>> implemented as >>>>>>>>>>> desired per-platform. Alternatively there could be pre- and post- >>>>>>>>>>> hooks >>>>>>>>>>> that could then be customized per platform. Otherwise you need >>>>>>>>>>> platform-specific ifdef's to handle it as per your patch. >>>>>>>>>>> >>>>>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think >>>>>>>>>>> this >>>>>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier >>>>>>>>>>> abstractions are a confused mess in my opinion. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>>> >>>>>>>>>>>> I did not yet add the >>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>>> or >>>>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>>>>>> to reduce #defined, as I got no further comment on that. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>>>>>>>>> I feel not in the position to contribute substantially. >>>>>>>>>>>> >>>>>>>>>>>> But we would like to pass the torture test suite as we consider >>>>>>>>>>>> this a substantial task in implementing a PPC port. Also we think >>>>>>>>>>>> both tests show behavior a programmer would expect. It's bad if >>>>>>>>>>>> Java code runs fine on the more common x86 platform, and then >>>>>>>>>>>> fails on ppc. This will always first be blamed on the VM. >>>>>>>>>>>> >>>>>>>>>>>> The fixes for the constructor issue are only needed because we >>>>>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>>>>>>>>> and place it before loads. Then there is no sync between volatile >>>>>>>>>>>> store >>>>>>>>>>>> and publishing the object. So we add it again in this one case >>>>>>>>>>>> (volatile store in constructor). >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> @David >>>>>>>>>>>>>> Sure. There also is no solution as you require for the >>>>>>>>>>>>>> taskqueue problem yet, >>>>>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>>>>> continuous. >>>>>>>>>>>> That's not true, we did a lot of investigation and testing on this >>>>>>>>>>>> issue. >>>>>>>>>>>> And we came up with a solution we consider the best possible. If >>>>>>>>>>>> you >>>>>>>>>>>> have objections, you should at least give the draft of a better >>>>>>>>>>>> solution, >>>>>>>>>>>> we would volunteer to implement and test it. >>>>>>>>>>>> Similarly, we invested time in fixing the concurrency torture >>>>>>>>>>>> issues. >>>>>>>>>>>> >>>>>>>>>>>> @David >>>>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term >>>>>>>>>>>>> and >>>>>>>>>>>>> can't find any reference to it. >>>>>>>>>>>> We learned about this reading "A Tutorial Introduction to the >>>>>>>>>>>> ARM and >>>>>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient >>>>>>>>>>>> Work-Stealing for >>>>>>>>>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the >>>>>>>>>>>> taskqueue problem. >>>>>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>>>>>>>> >>>>>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I >>>>>>>>>>>> used 'read' >>>>>>>>>>>> instead. Sorry for that. (I also fixed that in the method name >>>>>>>>>>>> above). >>>>>>>>>>>> >>>>>>>>>>>> Best regards and thanks for all your involvements, >>>>>>>>>>>> Goetz. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>>> >>>>>>>>>>>> Hi Goetz, >>>>>>>>>>>> >>>>>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>> Hi David, >>>>>>>>>>>>> >>>>>>>>>>>>> -- Volatile in constuctor >>>>>>>>>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>>>>>>>>> missing constructor barrier. >>>>>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32 >>>>>>>>>>>>> processors >>>>>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks >>>>>>>>>>>>> Aleksey!) >>>>>>>>>>>> >>>>>>>>>>>> And see follow ups - the tests are invalid. >>>>>>>>>>>> >>>>>>>>>>>>> -- IRIW issue >>>>>>>>>>>>>> I can not possibly answer to the necessary level of detail with >>>>>>>>>>>>>> a few >>>>>>>>>>>>>> moments thought. >>>>>>>>>>>>> Sure. There also is no solution as you require for the taskqueue >>>>>>>>>>>>> problem yet, >>>>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>>>> >>>>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>>>> continuous. >>>>>>>>>>>> >>>>>>>>>>>>>> You are implying there is a problem here that will >>>>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>>>>> different?) >>>>>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore >>>>>>>>>>>>> I contributed a >>>>>>>>>>>>> solution with the #defines, and that's correct for all, but not >>>>>>>>>>>>> nice, I admit. >>>>>>>>>>>>> (I don't really know about ARM, though). >>>>>>>>>>>>> So if I can write down a nicer solution testing for methods that >>>>>>>>>>>>> are evaluated >>>>>>>>>>>>> by the C-compiler I'm happy. >>>>>>>>>>>>> >>>>>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the >>>>>>>>>>>>> problem >>>>>>>>>>>>> is that >>>>>>>>>>>>> store >>>>>>>>>>>>> sync >>>>>>>>>>>>> does not assure multiple-read-atomicity, >>>>>>>>>>>>> only >>>>>>>>>>>>> sync >>>>>>>>>>>>> load >>>>>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>>>>>>>>> pass that test. >>>>>>>>>>>> >>>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the >>>>>>>>>>>> term and >>>>>>>>>>>> can't find any reference to it. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> David >>>>>>>>>>>> >>>>>>>>>>>> The JMM is fine. And >>>>>>>>>>>>> store >>>>>>>>>>>>> MemBarVolatile >>>>>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions >>>>>>>>>>>>> that >>>>>>>>>>>>> do what is required. >>>>>>>>>>>>> >>>>>>>>>>>>> So if you are off soon, please let's come to a solution that >>>>>>>>>>>>> might be improvable in the way it's implemented, but that >>>>>>>>>>>>> allows us to implement a correct PPC64 port. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards, >>>>>>>>>>>>> Goetz. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; >>>>>>>>>>>>> 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Goetz, >>>>>>>>>>>>> >>>>>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>>> Hi everybody, >>>>>>>>>>>>>> >>>>>>>>>>>>>> thanks a lot for the detailed reviews! >>>>>>>>>>>>>> I'll try to answer to all in one mail. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM >>>>>>>>>>>>>>> to occur before the reference is assigned; >>>>>>>>>>>>>> We don't think it's correct if we omit the barrier after >>>>>>>>>>>>>> initializing >>>>>>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey >>>>>>>>>>>>>> Shipilev >>>>>>>>>>>>>> and Doug Lea, and they agreed. >>>>>>>>>>>>>> Also, concurrency torture tests >>>>>>>>>>>>>> LongVolatileTest >>>>>>>>>>>>>> AtomicIntegerInitialValueTest >>>>>>>>>>>>>> will fail. >>>>>>>>>>>>>> (In addition, observing 0 instead of the inital value of a >>>>>>>>>>>>>> volatile field would be >>>>>>>>>>>>>> very counter-intuitive for Java programmers, especially in >>>>>>>>>>>>>> AtomicInteger.) >>>>>>>>>>>>> >>>>>>>>>>>>> The affects of unsafe publication are always surprising - >>>>>>>>>>>>> volatiles do >>>>>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM >>>>>>>>>>>>> that >>>>>>>>>>>>> requires the constructor barrier - discussions with Doug and >>>>>>>>>>>>> Aleksey >>>>>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>>>>>>>>> missing constructor barrier. >>>>>>>>>>>>> >>>>>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely >>>>>>>>>>>>>>> heavyweight >>>>>>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We >>>>>>>>>>>>>> don't >>>>>>>>>>>>>> see a way to implement this cheaper. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> - these algorithms should be expressed using the correct >>>>>>>>>>>>>>> OrderAccess operations >>>>>>>>>>>>>> Basically, I agree on this. But you also have to take into >>>>>>>>>>>>>> account >>>>>>>>>>>>>> that due to the different memory ordering instructions on >>>>>>>>>>>>>> different platforms >>>>>>>>>>>>>> just implementing something empty is not sufficient. >>>>>>>>>>>>>> An example: >>>>>>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>>>>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>>>>>>>>> If these are consecutively in the code, sparc code looks like >>>>>>>>>>>>>> this: >>>>>>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore | >>>>>>>>>>>>>> Assembler::StoreStore) >>>>>>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>>>>>>>>> Just doing what is required. >>>>>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>>>>>>>>> fine grained operations: >>>>>>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore, >>>>>>>>>>>>>> StoreStore, LoadLoad >>>>>>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore, >>>>>>>>>>>>>> StoreStore, LoadLoad, StoreLoad >>>>>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations >>>>>>>>>>>>>> are more (too) powerful, >>>>>>>>>>>>>> I need an additional optimization that removes the lwsync. I >>>>>>>>>>>>>> can not implement >>>>>>>>>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Back to the IRIW problem. I think here we have a comparable >>>>>>>>>>>>>> issue. >>>>>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the >>>>>>>>>>>>>> read >>>>>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I would propose to guard the code by >>>>>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>>>>>>>>> Else, David, how would you propose to implement this platform >>>>>>>>>>>>>> independent? >>>>>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>>>>>>>>> >>>>>>>>>>>>> I can not possibly answer to the necessary level of detail with a >>>>>>>>>>>>> few >>>>>>>>>>>>> moments thought. You are implying there is a problem here that >>>>>>>>>>>>> will >>>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>>>> different?) and I can not take that on face value at the >>>>>>>>>>>>> moment. The >>>>>>>>>>>>> only reason I can see IRIW not being handled by the JMM >>>>>>>>>>>>> requirements for >>>>>>>>>>>>> volatile accesses is if there are global visibility issues that >>>>>>>>>>>>> are not >>>>>>>>>>>>> addressed - but even then I would expect heavy barriers at the >>>>>>>>>>>>> store >>>>>>>>>>>>> would deal with that, not at the load. (This situation reminds me >>>>>>>>>>>>> of the >>>>>>>>>>>>> need for read-barriers on Alpha architecture due to the use of >>>>>>>>>>>>> software >>>>>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we >>>>>>>>>>>>> don't have >>>>>>>>>>>>> that on ppc!) >>>>>>>>>>>>> >>>>>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days >>>>>>>>>>>>> I will >>>>>>>>>>>>> be heading out on vacation for two weeks. >>>>>>>>>>>>> >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>> Goetz. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- Other ports: >>>>>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so >>>>>>>>>>>>>> it might >>>>>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of >>>>>>>>>>>>>> them). >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- MemBarStoreStore after initialization >>>>>>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I >>>>>>>>>>>>>> can >>>>>>>>>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>>>>>>>>> To: Vladimir Kozlov >>>>>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>>>>> >>>>>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable >>>>>>>>>>>>>> way :) >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>>>>>>>>> I have to ask David to do correctness evaluation. >>>>>>>>>>>>>> >>>>>>>>>>>>>> From what I understand what we see here is an attempt to >>>>>>>>>>>>>> fix an >>>>>>>>>>>>>> existing issue with the implementation of volatiles so that the >>>>>>>>>>>>>> IRIW >>>>>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when >>>>>>>>>>>>>> doing the >>>>>>>>>>>>>> load. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Now if this was purely handled in ppc64 source code then I >>>>>>>>>>>>>> would be >>>>>>>>>>>>>> happy to let them do whatever they like (surely this kills >>>>>>>>>>>>>> performance >>>>>>>>>>>>>> though!). But I do not agree with the changes to the shared code >>>>>>>>>>>>>> that >>>>>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY >>>>>>>>>>>>>> this is >>>>>>>>>>>>>> polluting the shared code. My concern is similar to what I said >>>>>>>>>>>>>> with the >>>>>>>>>>>>>> taskQueue changes - these algorithms should be expressed using >>>>>>>>>>>>>> the >>>>>>>>>>>>>> correct OrderAccess operations to guarantee the desired >>>>>>>>>>>>>> properties >>>>>>>>>>>>>> independent of architecture. If such a "barrier" is not needed >>>>>>>>>>>>>> on a >>>>>>>>>>>>>> given architecture then the implementation in OrderAccess should >>>>>>>>>>>>>> reduce >>>>>>>>>>>>>> to a no-op. >>>>>>>>>>>>>> >>>>>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed >>>>>>>>>>>>>> under >>>>>>>>>>>>>> the JMM. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am fine with suggested changes because you did not change our >>>>>>>>>>>>>>> current >>>>>>>>>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>>>>>>>>> But may be it should be done using more general query which >>>>>>>>>>>>>>> is set >>>>>>>>>>>>>>> depending on platform: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> or similar to what we use now: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>>>>> >>>>>>>>>>>>>> Every platform has to support IRIW this is simply part of the >>>>>>>>>>>>>> Java >>>>>>>>>>>>>> Memory Model, there should not be any need to call this out >>>>>>>>>>>>>> explicitly >>>>>>>>>>>>>> like this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are >>>>>>>>>>>>>> there >>>>>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM >>>>>>>>>>>>>> defines? >>>>>>>>>>>>>>> From what I understand our ppc port is also affected. >>>>>>>>>>>>>>> David? >>>>>>>>>>>>>> >>>>>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>>>>>>>>> >>>>>>>>>>>>>> David >>>>>>>>>>>>>> ----- >>>>>>>>>>>>>> >>>>>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be >>>>>>>>>>>>>>> inside else {}. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64 >>>>>>>>>>>>>>> specific which >>>>>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need >>>>>>>>>>>>>>> PPC64_ONLY() >>>>>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many >>>>>>>>>>>>>>> #ifdefs. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if (is_vol) { >>>>>>>>>>>>>>> // See comment in do_get_xxx(). >>>>>>>>>>>>>>> #ifndef PPC64 >>>>>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>>>>>>>>> #else >>>>>>>>>>>>>>> if (is_field) { >>>>>>>>>>>>>>> // Add MemBarRelease for constructors which write >>>>>>>>>>>>>>> volatile field >>>>>>>>>>>>>>> (PPC64). >>>>>>>>>>>>>>> set_wrote_volatile(true); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> #endif >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Vladimir >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the >>>>>>>>>>>>>>>> VolatileIRIWTest of >>>>>>>>>>>>>>>> the torture test suite: >>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Example: >>>>>>>>>>>>>>>> volatile x=0, y=0 >>>>>>>>>>>>>>>> __________ __________ __________ __________ >>>>>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>>>>>>>>> read(y) read(x) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This >>>>>>>>>>>>>>>> is only >>>>>>>>>>>>>>>> assured by the sync instruction and if it is executed in the >>>>>>>>>>>>>>>> threads >>>>>>>>>>>>>>>> doing the loads. Thus we implement volatile read as >>>>>>>>>>>>>>>> sync-load-acquire >>>>>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store. >>>>>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync. >>>>>>>>>>>>>>>> We fix this in C2 and the cpp interpreter. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple >>>>>>>>>>>>>>>> SIGSEGVs >>>>>>>>>>>>>>>> fails on staxf" for taskqueue.hpp. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Further this change contains a fix that assures that volatile >>>>>>>>>>>>>>>> fields >>>>>>>>>>>>>>>> written in constructors are visible before the reference gets >>>>>>>>>>>>>>>> published. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us, >>>>>>>>>>>>>>>> seems too >>>>>>>>>>>>>>>> strong. >>>>>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should >>>>>>>>>>>>>>>> suffice. >>>>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Please review and test this change. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>> Goetz. >>>>>>>>>>>>>>>> From goetz.lindenmaier at sap.com Wed Jan 22 09:20:13 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 22 Jan 2014 17:20:13 +0000 Subject: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes In-Reply-To: <52DFF98C.8010001@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CE6883E@DEWDFEMB12A.global.corp.sap> <52968167.4050906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE6D7CA@DEWDFEMB12A.global.corp.sap> <52B3CE56.9030205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE720F9@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC2CE8C35D@DEWDFEMB12A.global.corp.sap> <52D5DC80.1040003@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8C5AB@DEWDFEMB12A.global.corp.sap> <52D76D50.60700@oracle.com> <52D78697.2090408@oracle.com> <52D79982.4060100@oracle.com> <52D79E61.1060801@oracle.com> <52D7A0A9.6070208@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8CF70@DEWDFEMB12A.global.corp.sap> <52DDFD9D.3050205@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8EBA7@DEWDFEMB12A.global.corp.sap> <52DE5FB0.5000808@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8EC55@DEWDFEMB12A.global.corp.sap> <52DED1D2.1070203@oracle.com> <4295855A5C1DE049A61835A1887419CC2CE8EF85@DEWDFEMB12A.global.corp.sap> <52DFF98C.8010001@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CE8F332@DEWDFEMB12A.global.corp.sap> Hi Vladimir, > Yes, I will backport it. That's good, thank you! > I will build bundles and give them to SQE for final testing. __final__ That's even better, that's great!!! Best regards, Goetz. -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Mittwoch, 22. Januar 2014 18:02 To: Lindenmaier, Goetz Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes Hi Goetz On 1/22/14 1:20 AM, Lindenmaier, Goetz wrote: > Hi Vladimir, > > Thanks for testing and pushing! > > Will you push this also to stage? I assume we can handle this > as the other three hotspot changes, without a new bug-id? Yes, I will backport it. What about JDK changes Volker pushed: 8028537, 8031134, 8031997 and new one from today 8031581? Should I backport all of them into 8u stage? From conversion between Volker and Alan some of them need backport a fix from jdk9. Or I am mistaking? > > Also, when do you think we (you unfortunately) should update > the repos again? Stage-9 maybe after Volkers last change is submitted? After I test and push 8031581 I will do sync with latest jdk9 sources (b01). I will build bundles and give them to SQE for final testing. Thanks, Vladimir > > Best regards, > Goetz > > > > -----Original Message----- > From: hotspot-dev-bounces at openjdk.java.net [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov > Sent: Dienstag, 21. Januar 2014 21:00 > Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' > Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes > > Thanks. I am pushing it. > > Vladimir > > On 1/21/14 5:19 AM, Lindenmaier, Goetz wrote: >> Sorry, I missed that. fixed. >> >> Best regards, >> Goetz. >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Dienstag, 21. Januar 2014 12:53 >> To: Lindenmaier, Goetz; Vladimir Kozlov >> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >> >> Thanks Goetz! >> >> This typo still exists: >> >> + bool _wrote_volatile; // Did we write a final field? >> >> s/final/volatile/ >> >> Otherwise no further comments from me. >> >> David >> >> On 21/01/2014 7:22 PM, Lindenmaier, Goetz wrote: >>> Hi, >>> >>> I made a new webrev >>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-3-raw/ >>> differing from >>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ >>> only in the comments. >>> >>> I removed >>> // Support ordering of "Independent Reads of Independent Writes". >>> everywhere, and edited the comments in the globalDefinition*.hpp >>> files. >>> >>> Best regards, >>> Goetz. >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Dienstag, 21. Januar 2014 05:55 >>> To: Lindenmaier, Goetz; Vladimir Kozlov >>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>> >>> Hi Goetz, >>> >>> On 17/01/2014 6:39 PM, Lindenmaier, Goetz wrote: >>>> Hi, >>>> >>>> I tried to come up with a webrev that implements the change as proposed in >>>> your mails: >>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/ >>>> >>>> Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use >>>> support_IRIW_for_not_multiple_copy_atomic_cpu. >>> >>> Given the flag name the commentary eg: >>> >>> + // Support ordering of "Independent Reads of Independent Writes". >>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) { >>> >>> seems somewhat redundant. >>> >>>> I left the definition and handling of _wrote_volatile in the code, without >>>> any protection. >>> >>> + bool _wrote_volatile; // Did we write a final field? >>> >>> s/final/volatile >>> >>>> I protected issuing the barrier for volatile in constructors with PPC64_ONLY() , >>>> and put it on one line. >>>> >>>> I removed the comment in library_call.cpp. >>>> I also removed the sentence " Solution: implement volatile read as sync-load-acquire." >>>> from the comments as it's PPC specific. >>> >>> I think the primary IRIW comment/explanation should go in >>> globalDefinitions.hpp where >>> support_IRIW_for_not_multiple_copy_atomic_cpu is defined. >>> >>>> Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these >>>> issues in C1 if nobody did it by then. >>> >>> I've filed: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8032366 >>> >>> "Implement C1 support for IRIW conformance on non-multiple-copy-atomic >>> platforms" >>> >>> to cover this task, as it may be needed sooner rather than later. >>> >>>> Wrt. to performance: Oracle will soon do heavy testing of the port. If any >>>> performance problems arise, we still can add #ifdef PPC64 to circumvent this. >>> >>> Ok. >>> >>> Thanks, >>> David >>> >>>> Best regards, >>>> Goetz. >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Donnerstag, 16. Januar 2014 10:05 >>>> To: Vladimir Kozlov >>>> Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net' >>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes >>>> >>>> On 16/01/2014 6:54 PM, Vladimir Kozlov wrote: >>>>> On 1/16/14 12:34 AM, David Holmes wrote: >>>>>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote: >>>>>>> This is becoming ugly #ifdef mess. In compiler code we are trying to >>>>>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I >>>>>>> want to keep it this way, it could be useful to have such info on other >>>>>>> platforms too. But I would suggest to remove PPC64 comments in >>>>>>> parse.hpp. >>>>>>> >>>>>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value >>>>>>> which could be checked in all places instead of #ifdef: >>>>>> >>>>>> I asked for the ifdef some time back as I find it much preferable to >>>>>> have this as a build-time construct rather than a >>>>>> runtime one. I don't want to have to pay anything for this if we don't >>>>>> use it. >>>>> >>>>> Any decent C++ compiler will optimize expressions with such constants >>>>> defined in header files. I insist to avoid #ifdefs in C2 code. I really >>>>> don't like the code with #ifdef in unsafe.cpp but I can live with it. >>>> >>>> If you insist then we may as well do it all the same way. Better to be >>>> consistent. >>>> >>>> My apologies Goetz for wasting your time going back and forth on this. >>>> >>>> That aside I have a further concern with this IRIW support - it is >>>> incomplete as there is no C1 support, as PPC64 isn't using client. If >>>> this is going on then we (which probably means the Oracle 'we') need to >>>> add the missing C1 code. >>>> >>>> David >>>> ----- >>>> >>>>> Vladimir >>>>> >>>>>> >>>>>> David >>>>>> >>>>>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true; >>>>>>> #else >>>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false; >>>>>>> #endif >>>>>>> >>>>>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever >>>>>>> >>>>>>> and then: >>>>>>> >>>>>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \ >>>>>>> oop p = JNIHandles::resolve(obj); \ >>>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) >>>>>>> OrderAccess::fence(); \ >>>>>>> volatile type_name v = OrderAccess::load_acquire((volatile >>>>>>> type_name*)index_oop_from_field_offset_long(p, offset)); >>>>>>> >>>>>>> And: >>>>>>> >>>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu && >>>>>>> field->is_volatile()) { >>>>>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier >>>>>>> + } >>>>>>> >>>>>>> And so on. The comments will be needed only in globalDefinitions.hpp >>>>>>> >>>>>>> The code in parse1.cpp could be put on one line: >>>>>>> >>>>>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() && >>>>>>> method()->is_initializer()) )) { >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 1/15/14 9:25 PM, David Holmes wrote: >>>>>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>> I updated the webrev: >>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>>> >>>>>>>>> - I removed the IRIW example in parse3.cpp >>>>>>>>> - I adapted the comments not to point to that comment, and to >>>>>>>>> reflect the new flagging. Also I mention that we support the >>>>>>>>> volatile constructor issue, but that it's not standard. >>>>>>>>> - I protected issuing the barrier for the constructor by PPC64. >>>>>>>>> I also think it's better to separate these this way. >>>>>>>> >>>>>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field >>>>>>>> declaration and all uses to be guarded by ifdef PPC64 too >>>>>>>> please. >>>>>>>> >>>>>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this >>>>>>>> comment doesn't make much sense to me and refers to >>>>>>>> ppc specific stuff in a shared file: >>>>>>>> >>>>>>>> if (is_volatile) { >>>>>>>> ! if (!is_store) { >>>>>>>> insert_mem_bar(Op_MemBarAcquire); >>>>>>>> ! } else { >>>>>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC >>>>>>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire. >>>>>>>> insert_mem_bar(Op_MemBarVolatile); >>>>>>>> + #endif >>>>>>>> + } >>>>>>>> >>>>>>>> I don't think the comment is needed. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>>>> Thanks for your comments! >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Goetz. >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>> Sent: Mittwoch, 15. Januar 2014 01:55 >>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; >>>>>>>>> 'hotspot-dev at openjdk.java.net' >>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>> Independent Reads of Independent Writes >>>>>>>>> >>>>>>>>> Hi Goetz, >>>>>>>>> >>>>>>>>> Sorry for the delay in getting back to this. >>>>>>>>> >>>>>>>>> The general changes to the volatile barriers to support IRIW are okay. >>>>>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more >>>>>>>>> specifically it is >>>>>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of >>>>>>>>> the commentary excessive, particularly for shared code. In particular >>>>>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the >>>>>>>>> explanation and I don't think we need it to that level of detail. >>>>>>>>> Seems >>>>>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate. >>>>>>>>> >>>>>>>>> The changes related to volatile writes in the constructor, as >>>>>>>>> discussed >>>>>>>>> are not required by the Java Memory Model. If you want to keep these >>>>>>>>> then I think they should all be guarded with PPC64 because it is not >>>>>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the >>>>>>>>> PPC64 porters. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> >>>>>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I updated this webrev. I detected a small flaw I made when editing >>>>>>>>>> this version. >>>>>>>>>> The #endif in line 322, parse3.cpp was in the wrong line. >>>>>>>>>> I also based the webrev on the latest version of the stage repo. >>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Goetz. >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Lindenmaier, Goetz >>>>>>>>>> Sent: Freitag, 20. Dezember 2013 13:47 >>>>>>>>>> To: David Holmes >>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>> >>>>>>>>>> Hi David, >>>>>>>>>> >>>>>>>>>>> So we can at least undo #4 now we have established those tests were >>>>>>>>>>> not >>>>>>>>>>> required to pass. >>>>>>>>>> We would prefer if we could keep this in. We want to avoid that it's >>>>>>>>>> blamed on the VM if java programs are failing on PPC after they >>>>>>>>>> worked >>>>>>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard >>>>>>>>>> it by >>>>>>>>>> a flag as proposed. But if you insist I will remove it. Also, this >>>>>>>>>> part is >>>>>>>>>> not that performance relevant. >>>>>>>>>> >>>>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>>>>> think >>>>>>>>>> I added a compile-time guard in this new webrev: >>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/ >>>>>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces >>>>>>>>>> several double negations I don't like, (#ifNdef >>>>>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC) >>>>>>>>>> but this way I only have to change the ppc platform. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Goetz >>>>>>>>>> >>>>>>>>>> P.S.: I will also be available over the Christmas period. >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>> Sent: Freitag, 20. Dezember 2013 05:58 >>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>> >>>>>>>>>> Sorry for the delay, it takes a while to catch up after two weeks >>>>>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check >>>>>>>>>> emails. >>>>>>>>>> >>>>>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> ok, I understand the tests are wrong. It's good this issue is >>>>>>>>>>> settled. >>>>>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof! >>>>>>>>>>> >>>>>>>>>>> About our change: David, the causality is the other way round. >>>>>>>>>>> The change is about IRIW. >>>>>>>>>>> 1. To pass IRIW, we must use sync instructions before loads. >>>>>>>>>> >>>>>>>>>> This is the part I still have some question marks over as the >>>>>>>>>> implications are not nice for performance on non-TSO platforms. >>>>>>>>>> But I'm >>>>>>>>>> no further along in processing that paper I'm afraid. >>>>>>>>>> >>>>>>>>>>> 2. If we do syncs before loads, we don't need to do them after >>>>>>>>>>> stores. >>>>>>>>>>> 3. If we don't do them after stores, we fail the volatile >>>>>>>>>>> constructor tests. >>>>>>>>>>> 4. So finally we added them again at the end of the constructor >>>>>>>>>>> after stores >>>>>>>>>>> to pass the volatile constructor tests. >>>>>>>>>> >>>>>>>>>> So we can at least undo #4 now we have established those tests >>>>>>>>>> were not >>>>>>>>>> required to pass. >>>>>>>>>> >>>>>>>>>>> We originally passed the constructor tests because the ppc memory >>>>>>>>>>> order >>>>>>>>>>> instructions are not as find-granular as the >>>>>>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad. >>>>>>>>>>> The only instruction >>>>>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, >>>>>>>>>>> therefore the >>>>>>>>>>> MemBarVolatile after the store fixes the constructor tests. The >>>>>>>>>>> proper representation >>>>>>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now >>>>>>>>>>> it's pointless >>>>>>>>>>> anyways. >>>>>>>>>>> >>>>>>>>>>>> I'm not happy with the ifdef approach but I won't block it. >>>>>>>>>>> I'd be happy to add a property >>>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>>>> >>>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I >>>>>>>>>> think >>>>>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic >>>>>>>>>> based not architecture based) as that will allows for turning this >>>>>>>>>> on/off for any architecture for testing purposes. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>>> or the like to guard the customization. I'd like that much better. >>>>>>>>>>> Or also >>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Goetz. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>> Sent: Donnerstag, 28. November 2013 00:34 >>>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>> >>>>>>>>>>> TL;DR version: >>>>>>>>>>> >>>>>>>>>>> Discussion on the c-i list has now confirmed that a >>>>>>>>>>> constructor-barrier >>>>>>>>>>> for volatiles is not required as part of the JMM specification. It >>>>>>>>>>> *may* >>>>>>>>>>> be required in an implementation that doesn't pre-zero memory to >>>>>>>>>>> ensure >>>>>>>>>>> you can't see uninitialized fields. So the tests for this are >>>>>>>>>>> invalid >>>>>>>>>>> and this part of the patch is not needed in general (ppc64 may >>>>>>>>>>> need it >>>>>>>>>>> due to other factors). >>>>>>>>>>> >>>>>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the >>>>>>>>>>> term :) >>>>>>>>>>> Second thanks for the reference to that paper! For reference: >>>>>>>>>>> >>>>>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a >>>>>>>>>>> complex interconnect) does not guarantee that a write becomes >>>>>>>>>>> visible to >>>>>>>>>>> all other hardware threads at the same time point; these >>>>>>>>>>> architectures >>>>>>>>>>> are not multiple-copy atomic." >>>>>>>>>>> >>>>>>>>>>> This is the visibility issue that I referred to and affects both >>>>>>>>>>> ARM and >>>>>>>>>>> PPC. But of course it is normally handled by using suitable barriers >>>>>>>>>>> after the stores that need to be visible. I think the crux of the >>>>>>>>>>> current issue is what you wrote below: >>>>>>>>>>> >>>>>>>>>>> > The fixes for the constructor issue are only needed because we >>>>>>>>>>> > remove the sync instruction from behind stores >>>>>>>>>>> (parse3.cpp:320) >>>>>>>>>>> > and place it before loads. >>>>>>>>>>> >>>>>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync >>>>>>>>>>> after >>>>>>>>>>> the store then you have to do something around the loads to get the >>>>>>>>>>> same >>>>>>>>>>> results! I still don't know what lead you to the conclusion that the >>>>>>>>>>> only way to fix the IRIW issue was to put the fence before the >>>>>>>>>>> load - >>>>>>>>>>> maybe when I get the chance to read that paper in full it will be >>>>>>>>>>> clearer. >>>>>>>>>>> >>>>>>>>>>> So ... the basic problem is that the current structure in the VM has >>>>>>>>>>> hard-wired one choice of how to get the right semantics for volatile >>>>>>>>>>> variables. You now want to customize that but not all the requisite >>>>>>>>>>> hooks are present. It would be better if volatile_load and >>>>>>>>>>> volatile_store were factored out so that they could be >>>>>>>>>>> implemented as >>>>>>>>>>> desired per-platform. Alternatively there could be pre- and post- >>>>>>>>>>> hooks >>>>>>>>>>> that could then be customized per platform. Otherwise you need >>>>>>>>>>> platform-specific ifdef's to handle it as per your patch. >>>>>>>>>>> >>>>>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think >>>>>>>>>>> this >>>>>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier >>>>>>>>>>> abstractions are a confused mess in my opinion. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir: >>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>>> >>>>>>>>>>>> I did not yet add the >>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>>> or >>>>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic() >>>>>>>>>>>> to reduce #defined, as I got no further comment on that. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM >>>>>>>>>>>> I feel not in the position to contribute substantially. >>>>>>>>>>>> >>>>>>>>>>>> But we would like to pass the torture test suite as we consider >>>>>>>>>>>> this a substantial task in implementing a PPC port. Also we think >>>>>>>>>>>> both tests show behavior a programmer would expect. It's bad if >>>>>>>>>>>> Java code runs fine on the more common x86 platform, and then >>>>>>>>>>>> fails on ppc. This will always first be blamed on the VM. >>>>>>>>>>>> >>>>>>>>>>>> The fixes for the constructor issue are only needed because we >>>>>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320) >>>>>>>>>>>> and place it before loads. Then there is no sync between volatile >>>>>>>>>>>> store >>>>>>>>>>>> and publishing the object. So we add it again in this one case >>>>>>>>>>>> (volatile store in constructor). >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> @David >>>>>>>>>>>>>> Sure. There also is no solution as you require for the >>>>>>>>>>>>>> taskqueue problem yet, >>>>>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>>>>> continuous. >>>>>>>>>>>> That's not true, we did a lot of investigation and testing on this >>>>>>>>>>>> issue. >>>>>>>>>>>> And we came up with a solution we consider the best possible. If >>>>>>>>>>>> you >>>>>>>>>>>> have objections, you should at least give the draft of a better >>>>>>>>>>>> solution, >>>>>>>>>>>> we would volunteer to implement and test it. >>>>>>>>>>>> Similarly, we invested time in fixing the concurrency torture >>>>>>>>>>>> issues. >>>>>>>>>>>> >>>>>>>>>>>> @David >>>>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term >>>>>>>>>>>>> and >>>>>>>>>>>>> can't find any reference to it. >>>>>>>>>>>> We learned about this reading "A Tutorial Introduction to the >>>>>>>>>>>> ARM and >>>>>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and >>>>>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient >>>>>>>>>>>> Work-Stealing for >>>>>>>>>>>> Weak Memory Models" by Nhat Minh L?, Antoniu Pop, Albert Cohen >>>>>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the >>>>>>>>>>>> taskqueue problem. >>>>>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>>>>>>>> >>>>>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I >>>>>>>>>>>> used 'read' >>>>>>>>>>>> instead. Sorry for that. (I also fixed that in the method name >>>>>>>>>>>> above). >>>>>>>>>>>> >>>>>>>>>>>> Best regards and thanks for all your involvements, >>>>>>>>>>>> Goetz. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53 >>>>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>>> >>>>>>>>>>>> Hi Goetz, >>>>>>>>>>>> >>>>>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>> Hi David, >>>>>>>>>>>>> >>>>>>>>>>>>> -- Volatile in constuctor >>>>>>>>>>>>>> AFAIK we have not seen those tests fail due to a >>>>>>>>>>>>>> missing constructor barrier. >>>>>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32 >>>>>>>>>>>>> processors >>>>>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks >>>>>>>>>>>>> Aleksey!) >>>>>>>>>>>> >>>>>>>>>>>> And see follow ups - the tests are invalid. >>>>>>>>>>>> >>>>>>>>>>>>> -- IRIW issue >>>>>>>>>>>>>> I can not possibly answer to the necessary level of detail with >>>>>>>>>>>>>> a few >>>>>>>>>>>>>> moments thought. >>>>>>>>>>>>> Sure. There also is no solution as you require for the taskqueue >>>>>>>>>>>>> problem yet, >>>>>>>>>>>>> and that's being discussed now for almost a year. >>>>>>>>>>>> >>>>>>>>>>>> It may have started a year ago but work on it has hardly been >>>>>>>>>>>> continuous. >>>>>>>>>>>> >>>>>>>>>>>>>> You are implying there is a problem here that will >>>>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>>>>> different?) >>>>>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore >>>>>>>>>>>>> I contributed a >>>>>>>>>>>>> solution with the #defines, and that's correct for all, but not >>>>>>>>>>>>> nice, I admit. >>>>>>>>>>>>> (I don't really know about ARM, though). >>>>>>>>>>>>> So if I can write down a nicer solution testing for methods that >>>>>>>>>>>>> are evaluated >>>>>>>>>>>>> by the C-compiler I'm happy. >>>>>>>>>>>>> >>>>>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the >>>>>>>>>>>>> problem >>>>>>>>>>>>> is that >>>>>>>>>>>>> store >>>>>>>>>>>>> sync >>>>>>>>>>>>> does not assure multiple-read-atomicity, >>>>>>>>>>>>> only >>>>>>>>>>>>> sync >>>>>>>>>>>>> load >>>>>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to >>>>>>>>>>>>> pass that test. >>>>>>>>>>>> >>>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the >>>>>>>>>>>> term and >>>>>>>>>>>> can't find any reference to it. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> David >>>>>>>>>>>> >>>>>>>>>>>> The JMM is fine. And >>>>>>>>>>>>> store >>>>>>>>>>>>> MemBarVolatile >>>>>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions >>>>>>>>>>>>> that >>>>>>>>>>>>> do what is required. >>>>>>>>>>>>> >>>>>>>>>>>>> So if you are off soon, please let's come to a solution that >>>>>>>>>>>>> might be improvable in the way it's implemented, but that >>>>>>>>>>>>> allows us to implement a correct PPC64 port. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards, >>>>>>>>>>>>> Goetz. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM >>>>>>>>>>>>> To: Lindenmaier, Goetz >>>>>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; >>>>>>>>>>>>> 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Goetz, >>>>>>>>>>>>> >>>>>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>>> Hi everybody, >>>>>>>>>>>>>> >>>>>>>>>>>>>> thanks a lot for the detailed reviews! >>>>>>>>>>>>>> I'll try to answer to all in one mail. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM >>>>>>>>>>>>>>> to occur before the reference is assigned; >>>>>>>>>>>>>> We don't think it's correct if we omit the barrier after >>>>>>>>>>>>>> initializing >>>>>>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey >>>>>>>>>>>>>> Shipilev >>>>>>>>>>>>>> and Doug Lea, and they agreed. >>>>>>>>>>>>>> Also, concurrency torture tests >>>>>>>>>>>>>> LongVolatileTest >>>>>>>>>>>>>> AtomicIntegerInitialValueTest >>>>>>>>>>>>>> will fail. >>>>>>>>>>>>>> (In addition, observing 0 instead of the inital value of a >>>>>>>>>>>>>> volatile field would be >>>>>>>>>>>>>> very counter-intuitive for Java programmers, especially in >>>>>>>>>>>>>> AtomicInteger.) >>>>>>>>>>>>> >>>>>>>>>>>>> The affects of unsafe publication are always surprising - >>>>>>>>>>>>> volatiles do >>>>>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM >>>>>>>>>>>>> that >>>>>>>>>>>>> requires the constructor barrier - discussions with Doug and >>>>>>>>>>>>> Aleksey >>>>>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a >>>>>>>>>>>>> missing constructor barrier. >>>>>>>>>>>>> >>>>>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely >>>>>>>>>>>>>>> heavyweight >>>>>>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We >>>>>>>>>>>>>> don't >>>>>>>>>>>>>> see a way to implement this cheaper. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> - these algorithms should be expressed using the correct >>>>>>>>>>>>>>> OrderAccess operations >>>>>>>>>>>>>> Basically, I agree on this. But you also have to take into >>>>>>>>>>>>>> account >>>>>>>>>>>>>> that due to the different memory ordering instructions on >>>>>>>>>>>>>> different platforms >>>>>>>>>>>>>> just implementing something empty is not sufficient. >>>>>>>>>>>>>> An example: >>>>>>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier >>>>>>>>>>>>>> MemBarVolatile // means StoreLoad barrier >>>>>>>>>>>>>> If these are consecutively in the code, sparc code looks like >>>>>>>>>>>>>> this: >>>>>>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore | >>>>>>>>>>>>>> Assembler::StoreStore) >>>>>>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad) >>>>>>>>>>>>>> Just doing what is required. >>>>>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable, >>>>>>>>>>>>>> fine grained operations: >>>>>>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore, >>>>>>>>>>>>>> StoreStore, LoadLoad >>>>>>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore, >>>>>>>>>>>>>> StoreStore, LoadLoad, StoreLoad >>>>>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations >>>>>>>>>>>>>> are more (too) powerful, >>>>>>>>>>>>>> I need an additional optimization that removes the lwsync. I >>>>>>>>>>>>>> can not implement >>>>>>>>>>>>>> MemBarRelease empty, as it is also used independently. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Back to the IRIW problem. I think here we have a comparable >>>>>>>>>>>>>> issue. >>>>>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the >>>>>>>>>>>>>> read >>>>>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I would propose to guard the code by >>>>>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better >>>>>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic() >>>>>>>>>>>>>> Else, David, how would you propose to implement this platform >>>>>>>>>>>>>> independent? >>>>>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.) >>>>>>>>>>>>> >>>>>>>>>>>>> I can not possibly answer to the necessary level of detail with a >>>>>>>>>>>>> few >>>>>>>>>>>>> moments thought. You are implying there is a problem here that >>>>>>>>>>>>> will >>>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so >>>>>>>>>>>>> different?) and I can not take that on face value at the >>>>>>>>>>>>> moment. The >>>>>>>>>>>>> only reason I can see IRIW not being handled by the JMM >>>>>>>>>>>>> requirements for >>>>>>>>>>>>> volatile accesses is if there are global visibility issues that >>>>>>>>>>>>> are not >>>>>>>>>>>>> addressed - but even then I would expect heavy barriers at the >>>>>>>>>>>>> store >>>>>>>>>>>>> would deal with that, not at the load. (This situation reminds me >>>>>>>>>>>>> of the >>>>>>>>>>>>> need for read-barriers on Alpha architecture due to the use of >>>>>>>>>>>>> software >>>>>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we >>>>>>>>>>>>> don't have >>>>>>>>>>>>> that on ppc!) >>>>>>>>>>>>> >>>>>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days >>>>>>>>>>>>> I will >>>>>>>>>>>>> be heading out on vacation for two weeks. >>>>>>>>>>>>> >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>> Goetz. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- Other ports: >>>>>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so >>>>>>>>>>>>>> it might >>>>>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead >>>>>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of >>>>>>>>>>>>>> them). >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- MemBarStoreStore after initialization >>>>>>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I >>>>>>>>>>>>>> can >>>>>>>>>>>>>> prepare an extra webrev for hotspot-comp. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM >>>>>>>>>>>>>> To: Vladimir Kozlov >>>>>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; >>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net' >>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of >>>>>>>>>>>>>> Independent Reads of Independent Writes >>>>>>>>>>>>>> >>>>>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable >>>>>>>>>>>>>> way :) >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote: >>>>>>>>>>>>>>> I have to ask David to do correctness evaluation. >>>>>>>>>>>>>> >>>>>>>>>>>>>> From what I understand what we see here is an attempt to >>>>>>>>>>>>>> fix an >>>>>>>>>>>>>> existing issue with the implementation of volatiles so that the >>>>>>>>>>>>>> IRIW >>>>>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make >>>>>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when >>>>>>>>>>>>>> doing the >>>>>>>>>>>>>> load. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Now if this was purely handled in ppc64 source code then I >>>>>>>>>>>>>> would be >>>>>>>>>>>>>> happy to let them do whatever they like (surely this kills >>>>>>>>>>>>>> performance >>>>>>>>>>>>>> though!). But I do not agree with the changes to the shared code >>>>>>>>>>>>>> that >>>>>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY >>>>>>>>>>>>>> this is >>>>>>>>>>>>>> polluting the shared code. My concern is similar to what I said >>>>>>>>>>>>>> with the >>>>>>>>>>>>>> taskQueue changes - these algorithms should be expressed using >>>>>>>>>>>>>> the >>>>>>>>>>>>>> correct OrderAccess operations to guarantee the desired >>>>>>>>>>>>>> properties >>>>>>>>>>>>>> independent of architecture. If such a "barrier" is not needed >>>>>>>>>>>>>> on a >>>>>>>>>>>>>> given architecture then the implementation in OrderAccess should >>>>>>>>>>>>>> reduce >>>>>>>>>>>>>> to a no-op. >>>>>>>>>>>>>> >>>>>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed >>>>>>>>>>>>>> under >>>>>>>>>>>>>> the JMM. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am fine with suggested changes because you did not change our >>>>>>>>>>>>>>> current >>>>>>>>>>>>>>> code for our platforms (please, do not change do_exits() now). >>>>>>>>>>>>>>> But may be it should be done using more general query which >>>>>>>>>>>>>>> is set >>>>>>>>>>>>>>> depending on platform: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering() >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> or similar to what we use now: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering() >>>>>>>>>>>>>> >>>>>>>>>>>>>> Every platform has to support IRIW this is simply part of the >>>>>>>>>>>>>> Java >>>>>>>>>>>>>> Memory Model, there should not be any need to call this out >>>>>>>>>>>>>> explicitly >>>>>>>>>>>>>> like this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are >>>>>>>>>>>>>> there >>>>>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM >>>>>>>>>>>>>> defines? >>>>>>>>>>>>>>> From what I understand our ppc port is also affected. >>>>>>>>>>>>>>> David? >>>>>>>>>>>>>> >>>>>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry. >>>>>>>>>>>>>> >>>>>>>>>>>>>> David >>>>>>>>>>>>>> ----- >>>>>>>>>>>>>> >>>>>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be >>>>>>>>>>>>>>> inside else {}. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64 >>>>>>>>>>>>>>> specific which >>>>>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need >>>>>>>>>>>>>>> PPC64_ONLY() >>>>>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many >>>>>>>>>>>>>>> #ifdefs. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In do_put_xxx() can you combine your changes: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if (is_vol) { >>>>>>>>>>>>>>> // See comment in do_get_xxx(). >>>>>>>>>>>>>>> #ifndef PPC64 >>>>>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar >>>>>>>>>>>>>>> #else >>>>>>>>>>>>>>> if (is_field) { >>>>>>>>>>>>>>> // Add MemBarRelease for constructors which write >>>>>>>>>>>>>>> volatile field >>>>>>>>>>>>>>> (PPC64). >>>>>>>>>>>>>>> set_wrote_volatile(true); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> #endif >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Vladimir >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote: >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the >>>>>>>>>>>>>>>> VolatileIRIWTest of >>>>>>>>>>>>>>>> the torture test suite: >>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Example: >>>>>>>>>>>>>>>> volatile x=0, y=0 >>>>>>>>>>>>>>>> __________ __________ __________ __________ >>>>>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 | >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y) >>>>>>>>>>>>>>>> read(y) read(x) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This >>>>>>>>>>>>>>>> is only >>>>>>>>>>>>>>>> assured by the sync instruction and if it is executed in the >>>>>>>>>>>>>>>> threads >>>>>>>>>>>>>>>> doing the loads. Thus we implement volatile read as >>>>>>>>>>>>>>>> sync-load-acquire >>>>>>>>>>>>>