RFR: 8187033: [PPC] Imporve performance of ObjectStreamClass.getClassDataLayout()
Peter Levart
peter.levart at gmail.com
Sun Sep 17 20:51:07 UTC 2017
Hi,
On 09/15/17 19:54, Hans Boehm wrote:
> The problem occurs if this is transformed (by hardware or compiler) to
>
> 1196 ClassDataSlot[] getClassDataLayout() throws
> InvalidClassException {
> 1197 // REMIND: synchronize instead of relying on fullFence()?
> <prefetch dataLayout>
> 1198 ClassDataSlot[] slots = DATA_LAYOUT_GUESS;
> 1199 if (slots == null) {
> if (dataLayout != DATA_LAYOUT_GUESS) <recover>
> 1200 slots = getClassDataLayout0();
> 1201 VarHandle.fullFence();
> 1202 dataLayout = slots;
> 1203 }
> 1204b tmp = slots[17];
> 1204.5 if (dataLayout != DATA_LAYOUT_GUESS) <recover>
> 1205 ...
>
> (This is only an illustration. If the problem were to occur in real
> life, it would probably occur as a
> result of a different optimization. DEC Alpha allowed this sort of
> thing for entirely different reasons.)
>
> Observe that
>
> (1) This transformation is allowed by the Java memory model, since
> dataLayout is not a final field.
> (2) This code breaks if another thread runs all of the initialization
> code, including the code that sets
> slots[17] and the code that sets dataLayout, between 1204b and 1204.5,
> but the check in
> 1204.5 still succeeds (because we guessed well). tmp will contain the
> pre-initialization value of slots[17].
>
> The fence is not executed by the reading thread, and has no impact on
> ordering within the reading thread.
>
> C++ fences have no effect unless they are paired with another fence or
> ordered atomic operation in
> the other thread involved in the communication. I think that is the
> current intent for Java as well.
Well, in that case, it's better to stick with final fields...
@Ogata
You said you implemented 4 variants:
On 09/04/17 07:20, Kazunori Ogata wrote:
> 1) Put VarHandle.fullFence() between initialization of ClassDataSlot[] and
> writing the reference to non-volatile dataLayout.
> Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-fence/
>
> 2) Use Unsafe.getObjectAcquire() and Unsafe.putObjectRelease() for
> reading/writing dataLayout.
> Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-unsafe/
>
> 3) Put reference to ClassDataSlot[] into a final field of an object and
> store the object to non-volatile dataLayout. Every invocation of
> getDataLayout() reads the final field needs to deference the object
> pointer.
> Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-final/
>
> 4) Put reference to ClassDataSlot[] into a final field of an object, read
> the final field immediately after the object creation, and store it to
> non-volatile dataLayout. I think this is also correct based on the
> semantics of final fields and data dependency.
> Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-final2/
>
>
> The performance improvements were:
>
> 1) +3.5%
> 2) +1.1%
> 3) +2.2%
> 4) +3.4%
>
The 1st and 4th are not correct as we have established. The 3rd is
promising, but does not have the most speed improvement. Perhaps because
of extra de-referencing.
What if 'dataLayout' was not an array of ClassDataSlot records, each of
them containing a reference to an ObjectStreamClass and a boolean, but
an object containing two arrays:
static class ClassDataLayout {
final ObjectStreamClass[] descs;
final boolean[] hasDatas;
}
Such object could be "unsafely" published. By eliminating the
intermediate ClassDataSlot object, number of de-references should be
kept down.
Here's a prototype:
http://cr.openjdk.java.net/~plevart/jdk10-dev/8187033_ObjectStreamClass.dataLayout/webrev.01/
Could you give it a try in your benchmark and compare it with your last
approach (with fullFence)?
Regards, Peter
More information about the core-libs-dev
mailing list