Improving ThreadLocalRandom (and related classes)

Wed Jan 9 10:55:09 UTC 2013

On 01/08/2013 08:33 PM, Doug Lea wrote:
> However, the actual ThreadLocalRandom object is padded to avoid 
> memory contention (which wouldn't be necessary or useful if already
> embedded withing Thread).

I'm tempted to disagree. While it is true most of the callers are
accessing Thread in the context of currentThread(), and most of the
Thread state is not updated, it can catastrophically break down once we
cram in the heavily updated fields.

E.g. this is the java.lang.Thread field layout as of 7u12:

$ java -jar java-object-layout.jar java.lang.Thread
Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Objects are 8 bytes aligned.

java.lang.Thread
 offset  size                     type description
      0    12                          (assumed to be the object header
+ first field alignment)
     12     4                      int Thread.priority
     16     8                     long Thread.eetop
     24     8                     long Thread.stackSize
     32     8                     long Thread.nativeParkEventPointer
     40     8                     long Thread.tid
     48     4                      int Thread.threadStatus
     52     1                  boolean Thread.single_step
     53     1                  boolean Thread.daemon
     54     1                  boolean Thread.stillborn
     55     1                          (alignment/padding gap)
     56     4                   char[] Thread.name
     60     4                   Thread Thread.threadQ
     64     4                 Runnable Thread.target
     68     4              ThreadGroup Thread.group
     72     4              ClassLoader Thread.contextClassLoader
     76     4     AccessControlContext Thread.inheritedAccessControlContext
     80     4           ThreadLocalMap Thread.threadLocals
     84     4           ThreadLocalMap Thread.inheritableThreadLocals
     88     4                   Object Thread.parkBlocker
     92     4            Interruptible Thread.blocker
     96     4                   Object Thread.blockerLock
    100     4 UncaughtExceptionHandler Thread.uncaughtExceptionHandler
    104                                (object boundary, size estimate)

That means adding a few primitive fields can easily overlap with the
fields for another Thread and make the false sharing quite the issue.
Padding out the inlined TLR state would save us from this trouble
(thankfully, @Contended can make that without the magical field
arrangements and finger crossing).

We can @Contended the whole Thread, which means pushing Thread to
consume 256 bytes instead of 104+ as it is now. While this seems to be
the large increase, it is a global win since padded TLR state is gone,
and we effectively hiding the Thread state in the "padding shadow".

My 2c.

-Aleksey.