RFR: 8187033: [PPC] Imporve performance of ObjectStreamClass.getClassDataLayout()
Kazunori Ogata
OGATAK at jp.ibm.com
Wed Sep 20 08:14:25 UTC 2017
Hi Peter,
The performance improvement was +2.9%. It is faster than the version that
uses an extra dereference (+2.2%).
Although it's slower than the variation of full fence, I think I
understand Hans's concern and I agree your fix is the right answer.
@Hans,
I thought DATA_LAYOUT_GUESS in your example is fetched from memory at
somewhere and arbitrary time, but I now understand the meaning of
"prefetch dataLayout" is to calculate the value of dataLayout without
accessing memory. I'm not sure how to calculate it, but I noticed that
even piking a random value can have a non-zero possibility of passing the
check at line 1204.5.
I agree that loading slot[17] can happen before executing full fence if
the value of dataLayout does not come from memory and there is no data
dependence between writing to dataLayout and reading from dataLayout. I
appreciate your comments.
Regards,
Ogata
From: Hans Boehm <hboehm at google.com>
To: Kazunori Ogata <OGATAK at jp.ibm.com>
Cc: Peter Levart <peter.levart at gmail.com>, core-libs-dev
<core-libs-dev at openjdk.java.net>
Date: 2017/09/19 05:47
Subject: Re: RFR: 8187033: [PPC] Imporve performance of
ObjectStreamClass.getClassDataLayout()
On Mon, Sep 18, 2017 at 10:52 AM, Kazunori Ogata <OGATAK at jp.ibm.com>
wrote:
>
> Hi Peter,
>
> Peter Levart <peter.levart at gmail.com> wrote on 2017/09/18 22:05:43:
>
> > On 09/18/2017 12:28 PM, Kazunori Ogata wrote:
> > > Hi Hans and Peter,
> > >
> > > Thank you for your comments.
> > >
> > > Regarding the code Hans showed, I don't yet understand what it the
> > > problem. Since the load at 1204b is a speculative one,
dereferencing
> > > slots[17] should not raise any exception. If the confirmation at
> 1204.5
> > > succeeds, the value of tmp must also be correct because we put full
> fence
> > > and we see a non-NULL reference that was stored after the full
fence.
> >
> > I don't know much, but I can imagine that speculative read may see the
> > value and guess it correctly based on let's say some CPU state of
> > half-processed write instruction in the pipeline, which is established
> > even before the fence instruction flushes writes to array slots. So I
> > can accept that such outcome is possible and doesn't violate JMM.
>
> This seems to me that the processor/platform can't implement full fence
> correctly. I think it is the platform's (processor's and compiler's)
> responsibility to support full fence, otherwise the platform can't
> implement all Java API, including VarHandle.fullFence().
As Peter said, my concern is not with exceptions, but with seeing
uninitialized
data for slots[17].
The semantics of "full fences" are tricky, but basically they don't
restrict
reordering in other threads, only the thread that executed the fence. The
thread
with the problematic reordering here is the one that saw a non-null
dataLayout value, and hence did not execute a fence.
Hence fences generally have to be paired with either another fence in the
other
thread, or some other ordering mechanism. That other ordering mechanism is
missing here, though many implementations will ensure correct ordering,
due to
hardware dependence-based ordering guarantees. But the JMM does not
promise that.
Hans
More information about the core-libs-dev
mailing list