RFR: 8003246: Add Supplier to ThreadLocal

Peter Levart peter.levart at gmail.com
Thu Dec 6 20:38:02 UTC 2012


On 12/06/2012 08:08 PM, Remi Forax wrote:
> On 12/06/2012 08:01 PM, Peter Levart wrote:
>> There's a quick trick that guarantees in-lining of get/set/remove:
>>
>>     public static class FastThreadLocal<T> extends ThreadLocal<T> {
>>         @Override
>>         public final T get() { return super.get(); }
>>
>>         @Override
>>         public final void set(T value) { super.set(value); }
>>
>>         @Override
>>         public final void remove() { super.remove(); }
>>     }
>>
>> ....just use static type FastThreadLocal everywhere in code.
>>
>> I tried it and it works.
>
> No, there is no way to have such guarantee, here, it works either 
> because the only class ThreadLocal you load is FastThreadLocal or 
> because the VM has profiled the callsite see that you only use 
> FastThreadLocal for a specific instruction.

Nothing is certain but death and taxes, I agree.

But think deeper, Remi!

How do you explain the following test:

public class ThreadLocalTest {

     static class Int { int value; }

     static class TL0 extends ThreadLocal<Int> {}
     static class TL1 extends ThreadLocal<Int> { public Int get() { 
return super.get(); } }
     static class TL2 extends ThreadLocal<Int> { public Int get() { 
return super.get(); } }
     static class TL3 extends ThreadLocal<Int> { public Int get() { 
return super.get(); } }
     static class TL4 extends ThreadLocal<Int> { public Int get() { 
return super.get(); } }

     static long doTest(ThreadLocal<Int> tl) {
         long t0 = System.nanoTime();
         for (int i = 0; i < 100000000; i++)
             tl.get().value++;
         return System.nanoTime() - t0;
     }

     static long doTest(FastThreadLocal<Int> tl) {
         long t0 = System.nanoTime();
         for (int i = 0; i < 100000000; i++)
             tl.get().value++;
         return System.nanoTime() - t0;
     }

     static long test0(ThreadLocal<Int> tl) {
         if (tl instanceof FastThreadLocal)
             return doTest((FastThreadLocal<Int>)tl);
         else
             return doTest(tl);
     }

     static void test(ThreadLocal<Int> tl) {
         tl.set(new Int());
         System.out.print(tl.getClass().getName() + ":");
         for (int i = 0; i < 8; i++)
             System.out.print(" " + test0(tl));
         System.out.println();
     }

     public static void main(String[] args) {
         TL0 tl0 = new TL0();
         test(tl0);
         test(new TL1());
         test(new TL2());
         test(new TL3());
         test(new TL4());
         test(tl0);
     }
}


Which prints the following (demonstrating almost 2x slowdown of TL0 - 
last line compared to first):

test.ThreadLocalTest$TL0: 342716421 326105315 300744544 300654890 
300726346 300752009 300700781 300735651
test.ThreadLocalTest$TL1: 321424139 312128166 312173383 312125203 
312142144 312150949 316760957 313393554
test.ThreadLocalTest$TL2: 525661886 524169413 524184405 524215685 
524162050 524400364 524174966 454370228
test.ThreadLocalTest$TL3: 472042229 471071328 464387909 468047355 
464795171 464466481 464449567 464365974
test.ThreadLocalTest$TL4: 459651686 454142365 454129481 454180718 
454217277 454109611 454119988 456978405
test.ThreadLocalTest$TL0: 582252322 582773455 582612509 582753610 
582626360 582852195 582805654 582598285

Now with a simple change of:

     static class TL0 extends FastThreadLocal<Int> {}

...the same test prints:

test.ThreadLocalTest$TL0: 330722181 325823711 301171182 309992192 
321868979 308111417 303806979 300612033
test.ThreadLocalTest$TL1: 330263857 326448062 300607081 300575641 
307442821 300616794 300548457 303462898
test.ThreadLocalTest$TL2: 319627165 311309477 311465815 311279612 
311294427 311315803 311470291 311293823
test.ThreadLocalTest$TL3: 526849874 524209792 524421574 524166747 
524396011 524163313 524395641 524165429
test.ThreadLocalTest$TL4: 464963126 455172216 455466304 455245487 
455368318 455093735 455125038 455317375
test.ThreadLocalTest$TL0: 300472239 300695398 300480230 303459397 
300451419 300679904 300445717 300451166


And that's very repeatable! Try it for yourself (on JDK8 of course).

Regards, Peter


>
>>
>>
>> Regards, Peter
>
> cheers,
> Rémi
>
>>
>> On 12/06/2012 01:03 PM, Doug Lea wrote:
>>> On 12/06/12 06:56, Vitaly Davidovich wrote:
>>>> Doug,
>>>>
>>>> When you see the fast to slow ThreadLocal transition due to class 
>>>> loading
>>>> invalidating inlined get(), do you not then see it get restored 
>>>> back to fast
>>>> mode since the receiver type in your call sites is still the 
>>>> monomorphic
>>>> ThreadLocal (and not the unrelated subclasses)? Just trying to 
>>>> understand what
>>>> Rémi and you are saying.
>>>>
>>>
>>> The possible outcomes are fairly non-deterministic, depending
>>> on hotspot's mood about recompiles, tiered-compile interactions,
>>> method size, Amddahl's law interactions, phase of moon, etc.
>>>
>>> (In j.u.c, we have learned that our users appreciate things
>>> being predictably fast enough rather than being
>>> unpredictably sometimes even faster but often slower.
>>> So when we see such cases, as with ThreadLocal, they get added
>>> to todo list.)
>>>
>>> -Doug
>>>
>>>
>>>
>>>
>>
>




More information about the core-libs-dev mailing list