Hi Kim,
>I've discussed this with others on the GC team; we think the minimal
>required barriers are CAS with memory_order_acq_rel, plus an acquire
>barrier on the else branch of
>
> 122 if (!test_mark->is_marked()) {
>...
> 261 } else {
> 262 assert(o->is_forwarded(), "Sanity");
> 263 new_obj = o->forwardee();
> 264 }
>
>We've not done enough analysis to show this is sufficient, but we
>think anything weaker is not sufficient for shared code.
Thank you for the discussions on your side with the GC team.
I summarized the point on why my change works as follows. Hope we are on the same page with this.
1. Current implementation
PSPromotionManager::copy_to_survivor_space is used to move live
objects to a different location. It uses a forwarding technique and
allows multiple threads to compete for performing the copy step.
The first thread succeeds in installing its copy in the old object as
forwardee. Other threads may need to discard their copy and use the
one generated by the first thread which has won the race.
Written program order:
(1) create new_obj as copy of obj
(2) full fence
(3) CAS to set the forwardee with new_obj
(4) full fence
(5) access to the new_obj's field if CAS succeeds
(6) access to the forwardee with "o->forwardee()" if CAS fails
(7) access to the forwardee's field if debugging is on
When thread0 succeeds in CAS at (3), the copied new_obj by thread0
must be accessible from thread1 at (6). (2) guarantees the order of
(1) and (3), although it is stronger than needed for the purpose of
ensuring a consistent view of copied new_obj from thread1.
(5), (6), and (7) must be executed after (3). Apparently, (4) looks
guranteeing the order, although it is redundant.
The order of (6) and (7) is guaranteed by consume.
(5) and (6) are on different control paths.
(5) and (7): Thread0 owns new_obj when CAS succeeded and can access it
without barrier.
2. Proposed change
Written program order:
(1) create new_obj as copy of obj
(2) release fence
(3) CAS to set the forwardee with new_obj
(4) no fence
(5) access to the new_obj's field if CAS succeeds
(6) access to the forwardee with "o->forwardee()" if CAS fails
(7) access to the forwardee's field if debugging is on
Release fence at (2) is sufficient to make the copied new_obj
accessible from a thread that fails in CAS.
No fence at (4) is acceptable because it is redundant.
The order of (5), (6), and (7) is the same as the current
implementation. It is not affected by the proposed change.
3. Reason why this is sufficient
Memory coherence guarantees that all the threads share a consistent
view on the access to the same memory location, which is "_mark" in
the target code. Thread0 writes the "_mark" when it succeeds in
CAS at (3) and thread1 reads the "_mark" when it failes in CAS at (3).
Thread1 also reads the "_mark" by invoking "o->forwardee()" at (6).
(See CoRR1 in Section 8 of
https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf)
Also, compilers do not speculatively load "o->forwardee()" at (6)
before the CAS at (3). This is ensured by the integrated compiler
barriers (clobber "memory" in the volatile inline asm code). And it is
also prevented because "_mark" is declared volatile.
Best regards,
--
Michihiro,
IBM Research - TokyoKim Barrett ---2018/05/26 01:01:40---> On May 22, 2018, at 12:16 PM, Doerr, Martin <martin.doerr@sap.com> wrote: >
From: Kim Barrett <kim.barrett@oracle.com>
To: "Doerr, Martin" <martin.doerr@sap.com>
Cc: Michihiro Horie <HORIE@jp.ibm.com>, "hotspot-dev@openjdk.java.net" <hotspot-dev@openjdk.java.net>, "hotspot-gc-dev@openjdk.java.net" <hotspot-gc-dev@openjdk.java.net>, Gustavo Bueno Romero <gromero@br.ibm.com>, "ppc-aix-port-dev@openjdk.java.net" <ppc-aix-port-dev@openjdk.java.net>, "david.holmes@oracle.com" <david.holmes@oracle.com>
Date: 2018/05/26 01:01
Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64