ThreadLocal lookup elimination

Fri Mar 5 17:41:24 UTC 2021

Hi Erik,

The implementation of ThreadLocal is based on HashMap:
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ThreadLocal.java#L76

Currently it is "impossible" for JIT compiler to reliably know that value stored by set() in hash map is the same as 
read by get().

Also because of ThreadLocal accessors's complex code, some calls may not be inlined and JIT does not know what side 
effect they may have - it assumes that they can modify a value.

Thanks,
Vladimir K

On 2/22/21 3:26 AM, Eirik Bjørsnøs wrote:
> Hello,
> 
> ThreadLocals are commonly used to transfer state across method boundaries
> in situations where passing them as method parameters is impractical.
> Examples include crossing APIs you don't own, or instrumenting code you
> don't own.
> 
> Consider the following pseudo-instrumented code (where the original code
> calls a getter inside a loop):
> 
> public class Student {
> 
>      private int age;
> 
>      public int maxAge(Student[] students) {
> 
>          // Instrumented code:
>          ExpensiveObject expensive = new ExpensiveObject();
>          expensive.recordSomething();
>          threadLocal.set(expensive);
> 
>          // Original code:
>          int max = 0;
>          for (Student student : students) {
>              max = Math.max(max, student.getAge());
>          }
>          return max;
>      }
> 
>      public int getAge() {
>          // Instrumented code
>          ExpensiveObject exp = threadLocal.get();
>          exp.recordSomething();
> 
>          // Original code:
>          return age;
>      }
> 
>      // Instrumented field:
>      private static ThreadLocal<ExpensiveObject> threadLocal = new
> ThreadLocal<>();
> }
> 
> The ThreadLocal is used here to avoid constructing ExpensiveObject
> instances in each invocation of getAge.
> 
> However, once a compiler worth its salt sees this code, it immediately
> wants to inline the getAge method:
> 
> // Instrumented code:
> ExpensiveObject expensive = new ExpensiveObject();
> expensive.recordSomething();
> threadLocal.set(expensive);
> 
> for (Student student : students) {
>      // Instrumented code
>      ExpensiveObject exp = threadLocal.get();
>      exp.recordSomething();
>      // Original code
>      max = Math.max(max, student.age);
> }
> 
> At this point, we see that the last write to threadLocal is 'expensive', so
> any following  'threadLocal.get()' should be substitutable for 'expensive'.
> So we could do the following instead:
> 
> for (Student student : students) {
>      // Instrumented code
>      expensive.recordSomething();
>      // Original code
>      max = Math.max(max, student.age);
> }
> 
> More generally, a compiler could record the first lookup of a ThreadLocal
> in a scope and substitute any following lookup with the first read (until
> the next write).
> 
> I'm pretty sure this would be immensely useful for my current use case
> (which instruments methods to count invocations), but perhaps it is also a
> useful optimization in a more general sense? Examples that come to mind are
> enterprise apps where transaction and security contexts are passed around
> using ThreadLocals.
> 
> Has this type of optimization been discussed before? Is it even possible to
> implement, or did I miss some dragons hiding in the details? What would the
> estimated work for an implementation look like? Are we looking at
> bachelor's thesis? Master's thesis? PhD?
> 
> Would love to hear some thoughts on this idea.
> 
> Cheers,
> Eirik.
>