RFR (XS) CR 8014233: java.lang.Thread should be @Contended
Peter Levart
peter.levart at gmail.com
Thu May 9 14:19:55 UTC 2013
Hi Aleksey,
Wouldn't it be even better if just threadLocalRandom* fields were
annotated with @Contended("ThreadLocal") ?
Some fields within the Thread object are accessed from non-local
threads. I don't know how frequently, but isolating just
threadLocalRandom* fields from all possible false-sharing scenarios
would seem even better, no?
Regards, Peter
On 05/08/2013 07:29 PM, Aleksey Shipilev wrote:
> Hi,
>
> This is from our backlog after JDK-8005926. After ThreadLocalRandom
> state was merged into Thread, we now have to deal with the false sharing
> induced by heavily-updated fields in Thread. TLR was padded before, and
> it should make sense to make Thread bear @Contended annotation to
> isolate its fields in the same manner.
>
> The webrev is here:
> http://cr.openjdk.java.net/~shade/8014233/webrev.00/
>
> Testing:
> - microbenchmarks (see below)
> - JPRT cycle against jdk8-tl
>
> The extended rationale for the change follows.
>
> If we look at the current Thread layout, we can see the TLR state is
> buried within the Thread instance. TLR state are by far the mostly
> updated fields in Thread now:
>
>> Running 64-bit HotSpot VM.
>> Using compressed references with 3-bit shift.
>> Objects are 8 bytes aligned.
>>
>> java.lang.Thread
>> offset size type description
>> 0 12 (assumed to be the object header + first field alignment)
>> 12 4 int Thread.priority
>> 16 8 long Thread.eetop
>> 24 8 long Thread.stackSize
>> 32 8 long Thread.nativeParkEventPointer
>> 40 8 long Thread.tid
>> 48 8 long Thread.threadLocalRandomSeed
>> 56 4 int Thread.threadStatus
>> 60 4 int Thread.threadLocalRandomProbe
>> 64 4 int Thread.threadLocalRandomSecondarySeed
>> 68 1 boolean Thread.single_step
>> 69 1 boolean Thread.daemon
>> 70 1 boolean Thread.stillborn
>> 71 1 (alignment/padding gap)
>> 72 4 char[] Thread.name
>> 76 4 Thread Thread.threadQ
>> 80 4 Runnable Thread.target
>> 84 4 ThreadGroup Thread.group
>> 88 4 ClassLoader Thread.contextClassLoader
>> 92 4 AccessControlContext Thread.inheritedAccessControlContext
>> 96 4 ThreadLocalMap Thread.threadLocals
>> 100 4 ThreadLocalMap Thread.inheritableThreadLocals
>> 104 4 Object Thread.parkBlocker
>> 108 4 Interruptible Thread.blocker
>> 112 4 Object Thread.blockerLock
>> 116 4 UncaughtExceptionHandler Thread.uncaughtExceptionHandler
>> 120 (object boundary, size estimate)
>> VM reports 120 bytes per instance
>
> Assuming current x86 hardware with 64-byte cache line sizes and current
> class layout, we can see the trailing fields in Thread are providing
> enough insulation from the false sharing with an adjacent object. Also,
> the Thread itself is large enough so that two TLRs belonging to
> different threads will not collide.
>
> However the leading fields are not enough: we have a few words which can
> occupy the same cache line, but belong to another object. This is where
> things can get worse in two ways: a) the TLR update can make the field
> access in adjacent object considerably slower; and much worse b) the
> update in the adjacent field can disturb the TLR state, which is
> critical for j.u.concurrent performance relying heavily on fast TLR.
>
> To illustrate both points, there is a simple benchmark driven by JMH
> (http://openjdk.java.net/projects/code-tools/jmh/):
> http://cr.openjdk.java.net/~shade/8014233/threadbench.zip
>
> On my 2x2 i5-2520M Linux x86_64 laptop, running latest jdk8-tl and
> Thread with/without @Contended that microbenchmark yields the following
> results [20x1 sec warmup, 20x1 sec measurements, 10 forks]:
>
> Accessing ThreadLocalRandom.current().nextInt():
> baseline: 932 +- 4 ops/usec
> @Contended: 927 +- 10 ops/usec
>
> Accessing TLR.current.nextInt() *and* Thread.getUEHandler():
> baseline: 454 +- 2 ops/usec
> @Contended: 490 +- 3 ops/usec
>
> One might note the $uncaughtExceptionHandler is the trailing field in
> the Thread, so it can naturally be false-shared with the adjacent
> thread's TLR. We had chosen this as the illustration, in real examples
> with multitude objects on the heap, we can get another contender.
>
> So that is ~10% performance hit on false sharing even on very small
> machine. Translating it back: having heavily-updated field in the object
> adjacent to Thread can bring these overheads to TLR, and then jeopardize
> j.u.c performance.
>
> Of course, as soon as status quo about field layout is changed, we might
> start to lose spectacularly. I would recommend we deal with this now, so
> less surprises come in the future.
>
> The caveat is that we are wasting some of the space per Thread instance.
> After the patch, we layout is:
>
>> java.lang.Thread
>> offset size type description
>> 0 12 (assumed to be the object header + first field alignment)
>> 12 128 (alignment/padding gap)
>> 140 4 int Thread.priority
>> 144 8 long Thread.eetop
>> 152 8 long Thread.stackSize
>> 160 8 long Thread.nativeParkEventPointer
>> 168 8 long Thread.tid
>> 176 8 long Thread.threadLocalRandomSeed
>> 184 4 int Thread.threadStatus
>> 188 4 int Thread.threadLocalRandomProbe
>> 192 4 int Thread.threadLocalRandomSecondarySeed
>> 196 1 boolean Thread.single_step
>> 197 1 boolean Thread.daemon
>> 198 1 boolean Thread.stillborn
>> 199 1 (alignment/padding gap)
>> 200 4 char[] Thread.name
>> 204 4 Thread Thread.threadQ
>> 208 4 Runnable Thread.target
>> 212 4 ThreadGroup Thread.group
>> 216 4 ClassLoader Thread.contextClassLoader
>> 220 4 AccessControlContext Thread.inheritedAccessControlContext
>> 224 4 ThreadLocalMap Thread.threadLocals
>> 228 4 ThreadLocalMap Thread.inheritableThreadLocals
>> 232 4 Object Thread.parkBlocker
>> 236 4 Interruptible Thread.blocker
>> 240 4 Object Thread.blockerLock
>> 244 4 UncaughtExceptionHandler Thread.uncaughtExceptionHandler
>> 248 (object boundary, size estimate)
>> VM reports 376 bytes per instance
> ...and we have additional 256 bytes per Thread (twice the
> -XX:ContendedPaddingWidth, actually). Seems irrelevant comparing to the
> space wasted in native memory for each thread, especially stack areas.
>
> Thanks,
> Aleksey.
More information about the core-libs-dev
mailing list