RFR: 8187033: [PPC] Imporve performance of ObjectStreamClass.getClassDataLayout()

Peter Levart peter.levart at gmail.com
Mon Sep 18 13:05:43 UTC 2017


Hi Ogata,

On 09/18/2017 12:28 PM, Kazunori Ogata wrote:
> Hi Hans and Peter,
>
> Thank you for your comments.
>
> Regarding the code Hans showed, I don't yet understand what it the
> problem.  Since the load at 1204b is a speculative one, dereferencing
> slots[17] should not raise any exception.  If the confirmation at 1204.5
> succeeds, the value of tmp must also be correct because we put full fence
> and we see a non-NULL reference that was stored after the full fence.

I don't know much, but I can imagine that speculative read may see the 
value and guess it correctly based on let's say some CPU state of 
half-processed write instruction in the pipeline, which is established 
even before the fence instruction flushes writes to array slots. So I 
can accept that such outcome is possible and doesn't violate JMM.

>
> Also note that even the original code doesn't assume that a ClassDataSlot
> array is singleton for a given class.  So even if another method modifies
> dataLayout between 1204b and 1204.5, the current thread can keep using the
> reference loaded earlier.  If a thread reads a non-NULL reference, the
> ClassDataSlot[] entries reachable through the reference must be correct
> because we put full fence.

It is not the problem of reading from different arrays since they are 
all equal in content. The problem is reading uninitialized array slots.

>
>
> @Peter,
>
> Thank you for the fix.  I'll measure the performance.

ClassDataLayout object (instead of ClassDataSlot[] array) is also an 
opportunity to put into it some pre-computed cached state which is 
re-computed frequently in ObjectInputStream.readSerialData and maybe 
also ObjectOutputStream.writeSerialData.

I'll show the variant which pre-computes the 'hasSpecialReadMethod' 
value which is always recomputed in ObjectInputStream.readSerialData if 
your benchmark shows this is a promising direction...

Regards, Peter

>
>
> Regards,
> Ogata
>
>
> Peter Levart <peter.levart at gmail.com> wrote on 2017/09/18 05:51:07:
>
>> From: Peter Levart <peter.levart at gmail.com>
>> To: Hans Boehm <hboehm at google.com>
>> Cc: Kazunori Ogata <OGATAK at jp.ibm.com>, core-libs-dev <core-libs-
>> dev at openjdk.java.net>
>> Date: 2017/09/18 05:51
>> Subject: Re: RFR: 8187033: [PPC] Imporve performance of
>> ObjectStreamClass.getClassDataLayout()
>>
>> Hi,
>> On 09/15/17 19:54, Hans Boehm wrote:
>> The problem occurs if this is transformed (by hardware or compiler) to
>>
>> 1196     ClassDataSlot[] getClassDataLayout() throws
> InvalidClassException {
>> 1197         // REMIND: synchronize instead of relying on fullFence()?
>>                   <prefetch dataLayout>
>> 1198         ClassDataSlot[] slots = DATA_LAYOUT_GUESS;
>> 1199         if (slots == null) {
>>                       if (dataLayout != DATA_LAYOUT_GUESS) <recover>
>> 1200             slots = getClassDataLayout0();
>> 1201             VarHandle.fullFence();
>> 1202             dataLayout = slots;
>> 1203         }
>> 1204b       tmp = slots[17];
>> 1204.5      if (dataLayout != DATA_LAYOUT_GUESS) <recover>
>> 1205     ...
>>
>> (This is only an illustration. If the problem were to occur in real
> life,
>> it would probably occur as a
>> result of a different optimization. DEC Alpha allowed this sort of thing
>> for entirely different reasons.)
>>
>> Observe that
>>
>> (1) This transformation is allowed by the Java memory model, since
>> dataLayout is not a final field.
>> (2) This code breaks if another thread runs all of the initialization
>> code, including the code that sets
>> slots[17] and the code that sets dataLayout, between 1204b and 1204.5,
> but
>> the check in
>> 1204.5 still succeeds (because we guessed well). tmp will contain the
> pre-
>> initialization value of slots[17].
>>
>> The fence is not executed by the reading thread, and has no impact on
>> ordering within the reading thread.
>>
>> C++ fences have no effect unless they are paired with another fence or
>> ordered atomic operation in
>> the other thread involved in the communication. I think that is the
>> current intent for Java as well.
>>
>> Well, in that case, it's better to stick with final fields...
>> @Ogata
>>
>> You said you implemented 4 variants:
>> On 09/04/17 07:20, Kazunori Ogata wrote:
>> 1) Put VarHandle.fullFence() between initialization of ClassDataSlot[]
> and
>> writing the reference to non-volatile dataLayout.
>>    Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-fence/
>>
>> 2) Use Unsafe.getObjectAcquire() and Unsafe.putObjectRelease() for
>> reading/writing dataLayout.
>>    Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-unsafe/
>>
>> 3) Put reference to ClassDataSlot[] into a final field of an object and
>> store the object to non-volatile dataLayout.  Every invocation of
>> getDataLayout() reads the final field needs to deference the object
>> pointer.
>>    Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-final/
>>
>> 4) Put reference to ClassDataSlot[] into a final field of an object,
> read
>> the final field immediately after the object creation, and store it to
>> non-volatile dataLayout.  I think this is also correct based on the
>> semantics of final fields and data dependency.
>>    Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-final2/
>>
>>
>> The performance improvements were:
>>
>> 1) +3.5%
>> 2) +1.1%
>> 3) +2.2%
>> 4) +3.4%
>>
>> The 1st and 4th are not correct as we have established. The 3rd is
>> promising, but does not have the most speed improvement. Perhaps because
>> of extra de-referencing.
>>
>> What if 'dataLayout' was not an array of ClassDataSlot records, each of
>> them containing a reference to an ObjectStreamClass and a boolean, but
> an
>> object containing two arrays:
>>
>>      static class ClassDataLayout {
>>          final ObjectStreamClass[] descs;
>>          final boolean[] hasDatas;
>>      }
>>
>> Such object could be "unsafely" published. By eliminating the
> intermediate
>> ClassDataSlot object, number of de-references should be kept down.
>>
>> Here's a prototype:
>>
>>      http://cr.openjdk.java.net/~plevart/jdk10-dev/
>> 8187033_ObjectStreamClass.dataLayout/webrev.01/
>>
>>
>> Could you give it a try in your benchmark and compare it with your last
>> approach (with fullFence)?
>>
>> Regards, Peter
>



More information about the core-libs-dev mailing list