RFR: 8003246: Add Supplier to ThreadLocal

Peter Levart peter.levart at gmail.com
Thu Dec 6 21:21:02 UTC 2012


Ok, I've got an explanation.

It's not because of using different static type of variables in code 
with final methods, but because TL0 was redirected to a separate method 
with separate call sites. The same happens using this variant:

     static class TL0 extends ThreadLocal<Int> {}
     static class TL1 extends ThreadLocal<Int> { public Int get() { 
return super.get(); } }
     static class TL2 extends ThreadLocal<Int> { public Int get() { 
return super.get(); } }
     static class TL3 extends ThreadLocal<Int> { public Int get() { 
return super.get(); } }
     static class TL4 extends ThreadLocal<Int> { public Int get() { 
return super.get(); } }

     static long doTest(ThreadLocal<Int> tl) {
         long t0 = System.nanoTime();
         for (int i = 0; i < 100000000; i++)
             tl.get().value++;
         return System.nanoTime() - t0;
     }

     static long doTest0(ThreadLocal<Int> tl) {
         long t0 = System.nanoTime();
         for (int i = 0; i < 100000000; i++)
             tl.get().value++;
         return System.nanoTime() - t0;
     }

     static long test0(ThreadLocal<Int> tl) {
         if (tl instanceof TL0)
             return doTest0(tl);
         else
             return doTest(tl);
     }


But I think that deoptimizations that Dough is talking about might be 
prevented by using the following variant of TL:

public class FastThreadLocal<T> extends ThreadLocal<T> {
     public final T getFast() { return super.get(); }
     public final void setFast(T value) { super.set(value); }
     public final void removeFast() { super.remove(); }
}

and invoking the "fast" methods in code.


Right?

Regards, Peter


On 12/06/2012 09:38 PM, Peter Levart wrote:
> On 12/06/2012 08:08 PM, Remi Forax wrote:
>> On 12/06/2012 08:01 PM, Peter Levart wrote:
>>> There's a quick trick that guarantees in-lining of get/set/remove:
>>>
>>>     public static class FastThreadLocal<T> extends ThreadLocal<T> {
>>>         @Override
>>>         public final T get() { return super.get(); }
>>>
>>>         @Override
>>>         public final void set(T value) { super.set(value); }
>>>
>>>         @Override
>>>         public final void remove() { super.remove(); }
>>>     }
>>>
>>> ....just use static type FastThreadLocal everywhere in code.
>>>
>>> I tried it and it works.
>>
>> No, there is no way to have such guarantee, here, it works either 
>> because the only class ThreadLocal you load is FastThreadLocal or 
>> because the VM has profiled the callsite see that you only use 
>> FastThreadLocal for a specific instruction.
>
> Nothing is certain but death and taxes, I agree.
>
> But think deeper, Remi!
>
> How do you explain the following test:
>
> public class ThreadLocalTest {
>
>     static class Int { int value; }
>
>     static class TL0 extends ThreadLocal<Int> {}
>     static class TL1 extends ThreadLocal<Int> { public Int get() { 
> return super.get(); } }
>     static class TL2 extends ThreadLocal<Int> { public Int get() { 
> return super.get(); } }
>     static class TL3 extends ThreadLocal<Int> { public Int get() { 
> return super.get(); } }
>     static class TL4 extends ThreadLocal<Int> { public Int get() { 
> return super.get(); } }
>
>     static long doTest(ThreadLocal<Int> tl) {
>         long t0 = System.nanoTime();
>         for (int i = 0; i < 100000000; i++)
>             tl.get().value++;
>         return System.nanoTime() - t0;
>     }
>
>     static long doTest(FastThreadLocal<Int> tl) {
>         long t0 = System.nanoTime();
>         for (int i = 0; i < 100000000; i++)
>             tl.get().value++;
>         return System.nanoTime() - t0;
>     }
>
>     static long test0(ThreadLocal<Int> tl) {
>         if (tl instanceof FastThreadLocal)
>             return doTest((FastThreadLocal<Int>)tl);
>         else
>             return doTest(tl);
>     }
>
>     static void test(ThreadLocal<Int> tl) {
>         tl.set(new Int());
>         System.out.print(tl.getClass().getName() + ":");
>         for (int i = 0; i < 8; i++)
>             System.out.print(" " + test0(tl));
>         System.out.println();
>     }
>
>     public static void main(String[] args) {
>         TL0 tl0 = new TL0();
>         test(tl0);
>         test(new TL1());
>         test(new TL2());
>         test(new TL3());
>         test(new TL4());
>         test(tl0);
>     }
> }
>
>
> Which prints the following (demonstrating almost 2x slowdown of TL0 - 
> last line compared to first):
>
> test.ThreadLocalTest$TL0: 342716421 326105315 300744544 300654890 
> 300726346 300752009 300700781 300735651
> test.ThreadLocalTest$TL1: 321424139 312128166 312173383 312125203 
> 312142144 312150949 316760957 313393554
> test.ThreadLocalTest$TL2: 525661886 524169413 524184405 524215685 
> 524162050 524400364 524174966 454370228
> test.ThreadLocalTest$TL3: 472042229 471071328 464387909 468047355 
> 464795171 464466481 464449567 464365974
> test.ThreadLocalTest$TL4: 459651686 454142365 454129481 454180718 
> 454217277 454109611 454119988 456978405
> test.ThreadLocalTest$TL0: 582252322 582773455 582612509 582753610 
> 582626360 582852195 582805654 582598285
>
> Now with a simple change of:
>
>     static class TL0 extends FastThreadLocal<Int> {}
>
> ...the same test prints:
>
> test.ThreadLocalTest$TL0: 330722181 325823711 301171182 309992192 
> 321868979 308111417 303806979 300612033
> test.ThreadLocalTest$TL1: 330263857 326448062 300607081 300575641 
> 307442821 300616794 300548457 303462898
> test.ThreadLocalTest$TL2: 319627165 311309477 311465815 311279612 
> 311294427 311315803 311470291 311293823
> test.ThreadLocalTest$TL3: 526849874 524209792 524421574 524166747 
> 524396011 524163313 524395641 524165429
> test.ThreadLocalTest$TL4: 464963126 455172216 455466304 455245487 
> 455368318 455093735 455125038 455317375
> test.ThreadLocalTest$TL0: 300472239 300695398 300480230 303459397 
> 300451419 300679904 300445717 300451166
>
>
> And that's very repeatable! Try it for yourself (on JDK8 of course).
>
> Regards, Peter
>
>
>>
>>>
>>>
>>> Regards, Peter
>>
>> cheers,
>> Rémi
>>
>>>
>>> On 12/06/2012 01:03 PM, Doug Lea wrote:
>>>> On 12/06/12 06:56, Vitaly Davidovich wrote:
>>>>> Doug,
>>>>>
>>>>> When you see the fast to slow ThreadLocal transition due to class 
>>>>> loading
>>>>> invalidating inlined get(), do you not then see it get restored 
>>>>> back to fast
>>>>> mode since the receiver type in your call sites is still the 
>>>>> monomorphic
>>>>> ThreadLocal (and not the unrelated subclasses)? Just trying to 
>>>>> understand what
>>>>> Rémi and you are saying.
>>>>>
>>>>
>>>> The possible outcomes are fairly non-deterministic, depending
>>>> on hotspot's mood about recompiles, tiered-compile interactions,
>>>> method size, Amddahl's law interactions, phase of moon, etc.
>>>>
>>>> (In j.u.c, we have learned that our users appreciate things
>>>> being predictably fast enough rather than being
>>>> unpredictably sometimes even faster but often slower.
>>>> So when we see such cases, as with ThreadLocal, they get added
>>>> to todo list.)
>>>>
>>>> -Doug
>>>>
>>>>
>>>>
>>>>
>>>
>>
>




More information about the core-libs-dev mailing list