ThreadLocal lookup elimination
Eirik Bjørsnøs
eirbjo at gmail.com
Mon Feb 22 11:26:13 UTC 2021
Hello,
ThreadLocals are commonly used to transfer state across method boundaries
in situations where passing them as method parameters is impractical.
Examples include crossing APIs you don't own, or instrumenting code you
don't own.
Consider the following pseudo-instrumented code (where the original code
calls a getter inside a loop):
public class Student {
private int age;
public int maxAge(Student[] students) {
// Instrumented code:
ExpensiveObject expensive = new ExpensiveObject();
expensive.recordSomething();
threadLocal.set(expensive);
// Original code:
int max = 0;
for (Student student : students) {
max = Math.max(max, student.getAge());
}
return max;
}
public int getAge() {
// Instrumented code
ExpensiveObject exp = threadLocal.get();
exp.recordSomething();
// Original code:
return age;
}
// Instrumented field:
private static ThreadLocal<ExpensiveObject> threadLocal = new
ThreadLocal<>();
}
The ThreadLocal is used here to avoid constructing ExpensiveObject
instances in each invocation of getAge.
However, once a compiler worth its salt sees this code, it immediately
wants to inline the getAge method:
// Instrumented code:
ExpensiveObject expensive = new ExpensiveObject();
expensive.recordSomething();
threadLocal.set(expensive);
for (Student student : students) {
// Instrumented code
ExpensiveObject exp = threadLocal.get();
exp.recordSomething();
// Original code
max = Math.max(max, student.age);
}
At this point, we see that the last write to threadLocal is 'expensive', so
any following 'threadLocal.get()' should be substitutable for 'expensive'.
So we could do the following instead:
for (Student student : students) {
// Instrumented code
expensive.recordSomething();
// Original code
max = Math.max(max, student.age);
}
More generally, a compiler could record the first lookup of a ThreadLocal
in a scope and substitute any following lookup with the first read (until
the next write).
I'm pretty sure this would be immensely useful for my current use case
(which instruments methods to count invocations), but perhaps it is also a
useful optimization in a more general sense? Examples that come to mind are
enterprise apps where transaction and security contexts are passed around
using ThreadLocals.
Has this type of optimization been discussed before? Is it even possible to
implement, or did I miss some dragons hiding in the details? What would the
estimated work for an implementation look like? Are we looking at
bachelor's thesis? Master's thesis? PhD?
Would love to hear some thoughts on this idea.
Cheers,
Eirik.
More information about the hotspot-compiler-dev
mailing list