RFR: 8187033: [PPC] Imporve performance of ObjectStreamClass.getClassDataLayout()

Kazunori Ogata OGATAK at jp.ibm.com
Mon Sep 18 10:28:31 UTC 2017


Hi Hans and Peter,

Thank you for your comments.

Regarding the code Hans showed, I don't yet understand what it the 
problem.  Since the load at 1204b is a speculative one, dereferencing 
slots[17] should not raise any exception.  If the confirmation at 1204.5 
succeeds, the value of tmp must also be correct because we put full fence 
and we see a non-NULL reference that was stored after the full fence.

Also note that even the original code doesn't assume that a ClassDataSlot 
array is singleton for a given class.  So even if another method modifies 
dataLayout between 1204b and 1204.5, the current thread can keep using the 
reference loaded earlier.  If a thread reads a non-NULL reference, the 
ClassDataSlot[] entries reachable through the reference must be correct 
because we put full fence.


@Peter,

Thank you for the fix.  I'll measure the performance.


Regards,
Ogata


Peter Levart <peter.levart at gmail.com> wrote on 2017/09/18 05:51:07:

> From: Peter Levart <peter.levart at gmail.com>
> To: Hans Boehm <hboehm at google.com>
> Cc: Kazunori Ogata <OGATAK at jp.ibm.com>, core-libs-dev <core-libs-
> dev at openjdk.java.net>
> Date: 2017/09/18 05:51
> Subject: Re: RFR: 8187033: [PPC] Imporve performance of 
> ObjectStreamClass.getClassDataLayout()
> 
> Hi,

> On 09/15/17 19:54, Hans Boehm wrote:
> The problem occurs if this is transformed (by hardware or compiler) to
> 
> 1196     ClassDataSlot[] getClassDataLayout() throws 
InvalidClassException {
> 1197         // REMIND: synchronize instead of relying on fullFence()?
>                  <prefetch dataLayout>
> 1198         ClassDataSlot[] slots = DATA_LAYOUT_GUESS;
> 1199         if (slots == null) {
>                      if (dataLayout != DATA_LAYOUT_GUESS) <recover>
> 1200             slots = getClassDataLayout0();
> 1201             VarHandle.fullFence();
> 1202             dataLayout = slots;
> 1203         }
> 1204b       tmp = slots[17];
> 1204.5      if (dataLayout != DATA_LAYOUT_GUESS) <recover>
> 1205     ... 
> 
> (This is only an illustration. If the problem were to occur in real 
life, 
> it would probably occur as a
> result of a different optimization. DEC Alpha allowed this sort of thing 

> for entirely different reasons.)
> 
> Observe that
> 
> (1) This transformation is allowed by the Java memory model, since 
> dataLayout is not a final field.
> (2) This code breaks if another thread runs all of the initialization 
> code, including the code that sets
> slots[17] and the code that sets dataLayout, between 1204b and 1204.5, 
but
> the check in
> 1204.5 still succeeds (because we guessed well). tmp will contain the 
pre-
> initialization value of slots[17].
> 
> The fence is not executed by the reading thread, and has no impact on 
> ordering within the reading thread.
> 
> C++ fences have no effect unless they are paired with another fence or 
> ordered atomic operation in
> the other thread involved in the communication. I think that is the 
> current intent for Java as well.
> 
> Well, in that case, it's better to stick with final fields...

> @Ogata
> 
> You said you implemented 4 variants:

> On 09/04/17 07:20, Kazunori Ogata wrote: 
> 1) Put VarHandle.fullFence() between initialization of ClassDataSlot[] 
and 
> writing the reference to non-volatile dataLayout.
>   Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-fence/
> 
> 2) Use Unsafe.getObjectAcquire() and Unsafe.putObjectRelease() for 
> reading/writing dataLayout.
>   Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-unsafe/
> 
> 3) Put reference to ClassDataSlot[] into a final field of an object and 
> store the object to non-volatile dataLayout.  Every invocation of 
> getDataLayout() reads the final field needs to deference the object 
> pointer.
>   Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-final/
> 
> 4) Put reference to ClassDataSlot[] into a final field of an object, 
read 
> the final field immediately after the object creation, and store it to 
> non-volatile dataLayout.  I think this is also correct based on the 
> semantics of final fields and data dependency.
>   Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-final2/
> 
> 
> The performance improvements were:
> 
> 1) +3.5%
> 2) +1.1%
> 3) +2.2%
> 4) +3.4%
> 

> 
> The 1st and 4th are not correct as we have established. The 3rd is 
> promising, but does not have the most speed improvement. Perhaps because 

> of extra de-referencing.
> 
> What if 'dataLayout' was not an array of ClassDataSlot records, each of 
> them containing a reference to an ObjectStreamClass and a boolean, but 
an 
> object containing two arrays:
> 
>     static class ClassDataLayout {
>         final ObjectStreamClass[] descs;
>         final boolean[] hasDatas;
>     }
> 
> Such object could be "unsafely" published. By eliminating the 
intermediate
> ClassDataSlot object, number of de-references should be kept down.
> 
> Here's a prototype:
> 
>     http://cr.openjdk.java.net/~plevart/jdk10-dev/
> 8187033_ObjectStreamClass.dataLayout/webrev.01/
> 
> 
> Could you give it a try in your benchmark and compare it with your last 
> approach (with fullFence)?
> 
> Regards, Peter




More information about the core-libs-dev mailing list