RFR (S): 8007898: Incorrect optimization of Memory Barriers in Matcher::post_store_load_barrier()
Vladimir Kozlov
vladimir.kozlov at oracle.com
Thu Jul 11 09:32:23 PDT 2013
Thank you, Martin
Vladimir
On 7/11/13 9:06 AM, Doerr, Martin wrote:
> Hi Vladimir,
>
> thanks for your explanation. If there's always a MemBarAcquire before the MembarVolatile, it should work. I agree.
>
> Searching for MemBarRelease for optimizing out redundant MemBarAcquire is different idea.
> We can discuss our solution when it comes with the PPC64 port (which will probably not be too soon).
>
> Anyway, your change looks good. We'll test it in our SAPJVM as well.
>
> Kind regards,
> Martin
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Donnerstag, 11. Juli 2013 17:41
> To: Doerr, Martin
> Cc: hotspot compiler; Aleksey Shipilev; Lindenmaier, Goetz
> Subject: Re: RFR (S): 8007898: Incorrect optimization of Memory Barriers in Matcher::post_store_load_barrier()
>
> Hi Martin,
>
> On 7/11/13 6:06 AM, Doerr, Martin wrote:
>> Hi Vladimir,
>>
>> thanks for fixing these problems. The change looks good.
>>
>> There's only one question remaining. We're not sure if the following pattern can occur:
>
> There are release and acquire membars in graph (I do not show proj nodes for simplicity):
>
>>
>> MemBarVolatile
>> ^ ^
>> | |
>> CtrlProj MemProj
>> ^ ^
>> | |
>> | Load volatile
> | ^
> | |
> MemBarAcquire
> ^
> |
> MemBarRelease
> ^
> |
> Store volatile
> ^
>> |
>> MemBarVolatile
>>
>
> So if there is volatile load between two stores, we should have MemBarAcquire between as well. That is why our current
> code should work (we don't skip first MemBarVolatile if we see MemBarAcquire).
>
>> If this can occur, we will have to traverse the memory edges in addition.
>
> I also thought so before but control path will have MemBarAcquire before we see second MemBarVolatile we may not need to
> look on memory path.
>
>> We have implemented a similar optimization to optimize MemBarAcquire before MemBarVolatile away
>> and we had to check the memory edges. An excerpt of our code is below.
>
> Volatile and acquire are different types of membar in hw: StoreLoad vs LoadStore.
> Yes, release membar is also LoadStore so in this sense you can "skip" MemBarAcquire (as in the graph above) but you need
> to search for MemBarRelease in such case.
>
> Thanks,
> Vladimir
>
>>
>> Kind regards,
>> Martin and Goetz
>>
>>
>>
>> // Get the Proj node, mem_proj, that can be used to iterate forward
>> Node *mem_proj = NULL;
>> DUIterator_Fast imax, i = mem->fast_outs(imax);
>> while( true ) {
>> mem_proj = mem->fast_out(i); // Throw out-of-bounds if proj not found
>> assert( mem_proj->is_Proj(), "only projections here" );
>> ProjNode *proj = (ProjNode*)mem_proj;
>> if( proj->_con == TypeFunc::Memory &&
>> !C->node_arena()->contains(mem_proj) ) // Unmatched old-space only
>> break;
>> i++;
>> }
>>
>> bool found_membar = false;
>> for( DUIterator_Fast jmax, j = mem_proj->fast_outs(jmax); j < jmax; j++ ) {
>> Node *x = mem_proj->fast_out(j);
>> // Proj might have an edge to a store or load node which precedes the membar
>> if (x->is_Mem()) return false;
>>
>> // We can only allow xop == Op_MemBarVolatile when we traverse TypeFunc::Memory instead of TypeFunc::Control
>> // because a LoadNode might get scheduled before the MemBarVolatile when considering TypeFunc::Control only.
>> int xop = x->Opcode();
>> if (xop == Op_MemBarVolatile) {
>> // Make sure we're not missing Call/Phi/MergeMem by checking control edges.
>> // That means we only set found_membar=true when x can be found via control proj AND memory proj.
>> Node *in = x->in(0);
>> if (in->is_Proj() && in->in(0)==vmb) found_membar = true;
>> }
>> }
>>
>> return found_membar;
>>
>>
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Donnerstag, 11. Juli 2013 05:39
>> To: hotspot compiler
>> Cc: Aleksey Shipilev; Doerr, Martin; Lindenmaier, Goetz
>> Subject: RFR (S): 8007898: Incorrect optimization of Memory Barriers in Matcher::post_store_load_barrier()
>>
>> http://cr.openjdk.java.net/~kvn/8007898/webrev/
>>
>> The problem is caused by complicated (and questionable) barriers code
>> generated for volatile stores. It is questionable from performance view
>> because second volatile membar is for bottom memory (AliasIdxBot). So
>> any memory operations can not pass it.
>> The fix is to generate one "fat" membar instead of set of barriers.
>>
>> I also cleaned/fixed corresponding code in Matcher which tries to
>> optimize out those multiply barriers. The problem with it was it did not
>> take into account biased locking when cas could be not executed.
>> Also code for FastUnlock ('If' check) was not removed before because of,
>> I think, incorrect comment "upcoming fastlock block" - it should be
>> "fastUnlock block".
>>
>> Added regression test which shows the problem with -XX:+StressLCM[GCM]
>> flags added by Aleksey some time ago.
>>
>> Aleksey is testing performance of these changes and no regression is
>> observed so far.
>>
>> Thanks,
>> Vladimir
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
More information about the hotspot-compiler-dev
mailing list