aarch64 DMB - patch
Andrew Dinn
adinn at redhat.com
Tue Jun 23 10:12:44 UTC 2015
Hi Benedikt,
On 17/06/15 14:26, Benedikt Wedenik wrote:
> I checked out both repositories and compared the AD-file.
> My patch also works in the latest version
> of hg.openjdk.java.net/jdk9/hs-comp
> <http://hg.openjdk.java.net/jdk9/hs-comp>.
>
> If ADinn is working on that part of the code right now, do you think I
> should talk to him directly?
You have been talking to him directly -- it's just that I have not been
responding because I have been away on holiday for a few weeks.
Firstly, here is a summary of what is currently being done to replace
memory barriers with ldar/stlr instructions.
I have already made one change in jdk9 to ensure that dmb instructions
are elided for volatile gets and non-object field volatile puts. You can
track progress for that patch via the associated JIRA issue:
https://bugs.openjdk.java.net/browse/JDK-8078263
That fix required modifying the ad file rules which match MemBarAcquire,
MemBarRelease and MemBarVolatile nodes to employ predicates which
filter out the cases where generation of a dmb can safely be omitted. It
also required changing the rules for put and get to use corresponding
predicates to generate stlr and ldar in precisely the same cases. The
predictes need to detect /exactly/ the same cases for elision and
generation of synchronizing loads/stores in order for the optimization
to be correct. You should look at the prior jdk9 aarch64 code to see why
these predicates are defined as is -- the jdk7 and jdk8 aarch64 rules
differ and are not a good starting point.
This first fix fails to optimize volatile object stores. That's because
the current predicates do not recognize the GC card mark nodes inserted
by the compiler. I am about to post a fix for this case to aarch64-dev
and hotspot-dev. The JIRA is
https://bugs.openjdk.java.net/browse/JDK-8078743
A follow-up fix will also optimize CAS operations to drop dmbs in favour
of ldar/stlr. This 3rd fix depends on the second fix as it requires use
of a common function to test for the presence of GC card mark nodes. The
JIRA issue is
https://bugs.openjdk.java.net/browse/JDK-8080293
Now, as regards your proposed patch -- it appears to be addressing the
unrelated case (unrelated to my changes above, that is) of memory
barriers associated with fast lock and fast unlock operations i.e. locks
associated with synchronized methods or synchronizations on objects via
the synchronized keyword. I am not sure your patch is valid wrt to the
jdk9 code base or even relative to jdk7/8.
Your attachment includes a change to elide the dmb instructions planted
when a MemBarAcquireLock or MemBarReleaseLock node is matched. These are
generated, respectively, before and after a FastLock and FastUnlock
node. The encodings for these latter two operations,
aarch64_enc_fast_lock and aarch64_enc_fast_unlock currently employ ldxr
and stlxr at the points where the object markOop field is being tested
and updated (this is true in jdk7/8/9). Note that /ldxr/ is not an
acquiring load. So, if your contention is that the barriers can be
dropped because the markOop load-exclusive + store-exclusive pair
provides sufficiently strong memory syncrhonization then at the very
least your patch would need to modify the encoding to use ldaxr in place
of ldxr.
However, I am not convinced that these barriers can be removed even
granted that change. There are various other memory operations encoded
in both the fast_lock and fast_unlock cases both before and after the
load-exclusive + store-exclusive pair. I believe the point of separating
out the MemBarAcquireLock and MemBarReleaseLock from FastLock and
FastUnlock is to ensure that those related memory operations are
correctly synchronized wrt to memory operations performed by other
threads which may be trying to synchronize on the same oop. If you think
I am wrong and your optimization is valid then you really need to
provide a detailed, convincing argument as to why -- n.b. that's not a
requirement to convince me but rather to convince the many experts on
this list who understand lock synchronization. Expect a lively and
lengthy debate if you want to pursue this.
regards,
Andrew Dinn
-----------
More information about the hotspot-compiler-dev
mailing list