enhancement of cmpxchg and copy_to_survivor for ppc64
David Holmes
david.holmes at oracle.com
Sat Apr 16 07:43:20 UTC 2016
Hi Hiroshi,
As the diff file does not survive the mail process I can't see the
actual proposed changes. There doesn't seem to be a bug for this so
could you please file one. Also can you get someone to host the webrev
for you on cr.openjdk.java.net? Or else include the diff in the bug report.
It is fine for ppc to have variations of cmpxchg with different memory
barrier semantics, but the shared API must not be affected as there is a
requirement that the basic form of this operation provide "full
bi-directional fence" semantics. Note that these semantics are not in
place to fulfill Java Memory Model requirements, but are an internal
contract in hotspot.
Thanks,
David
On 12/04/2016 3:59 AM, Christian Thalinger wrote:
> [This should be on hotspot-runtime-dev. BCC’ing hotspot-compiler-dev.]
>
>> On Apr 8, 2016, at 12:53 AM, Hiroshi H Horii <HORII at jp.ibm.com> wrote:
>>
>> Dear all:
>>
>> Can I please request reviews for the following change?
>> This change was created for JDK 9 and ppc64.
>>
>> Description:
>> This change adds options of compare-and-exchange for POWER architecture.
>> As described in atomic_linux_ppc.inline.hpp, the current implementation of
>> cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
>> general purposes because twice calls of sync before and after cmpxchg will
>> keep consistency. However, they sometimes cause overheads because
>> sync instructions are very expensive in the current POWER chip design.
>> With this change, callers can explicitly specify to run fence and acquire with
>> two additional bool parameters. Because their default values are "true",
>> it is not necessary to modify existing cmpxchg calls.
>>
>> In addition, with the new parameters of cmpxchg, this change improves
>> performance of copy_to_survivor in the parallel GC.
>> copy_to_survivor changes forward pointers by using cmpxchg. This
>> operation doesn't require any sync instructions, in my understanding.
>> A pointer is changed at most once in a GC and when cmpxchg fails,
>> the latest pointer is available for the caller.
>>
>> When I evaluated SPECjbb2013 (slightly customized because obsolete grizzly
>> doesn't support new version format of Java 9), pause time of young GC was
>> reduced from 10% to 20%.
>>
>> Summary of source code changes:
>>
>> * src/share/vm/runtime/atomic.hpp
>> * src/share/vm/runtime/atomic.cpp
>> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>> - Add two arguments of fence and acquire to cmpxchg only for PPC64.
>> Though cmpxchg in atomic_linux_ppc.inline.hpp has some branches,
>> they are reduced while inlining to callers.
>>
>> * src/share/vm/oops/oop.inline.hpp
>> - Changed cas_set_mark to call cmpxchg without fence and acquire.
>> cas_set_mark is called only by cas_forward_to that is called only by
>> copy_to_survivor_space and oop_promotion_failed in
>> psPromotionManager.
>>
>> Code change:
>>
>> Please see an attached diff file that was generated with "hg diff -g"
>> under the latest hotspot directory.
>>
>> Passed test:
>> SPECjbb2013 (customized)
>>
>> * I believe some other cmpxchg will be optimized by reducing fence
>> or acquire because twice calls of sync are too conservative to implement
>> Java memory model.
>>
>>
>>
>> Regards,
>> Hiroshi
>> -----------------------
>> Hiroshi Horii, Ph.D.
>> IBM Research - Tokyo
>>
>
More information about the ppc-aix-port-dev
mailing list