Scoped variables

Tue Dec 4 21:38:59 UTC 2018

On 12/4/18 12:32 PM, Doug Lea wrote:
> On 12/4/18 12:12 PM, Andrew Haley wrote:
>> Exactly, yes. The problem is that the current TheadLocal code is very
>> complex, and if we restrict ourselves to a simple get() we can do
>> better.
> Could you explain? Of the possibilities I'm aware of that might be cheaper:
>
> * The cheapest version is to access a field of current Thread/Fiber, as
> is possible with Threads by defining Thread subclasses.
>
> * Close behind is to have an index associated with each Thread/Fiber
> that users could then use to access data in a separate array (or
> whatever) that they otherwise manage themselves.

I suppose we could also invert the above and having an index associated with
each thread local to access an array in each fiber.  We could allocate 
the index
for the life of the VM if we had a flavor of "permanent" thread local.
However, recycling indexes and resizing arrays when a non-permanent
thread local becomes collectable by the GC doesn't sound trivial to me.

> * Of variants hinted at by John Rose, the only potentially fast kind I
> know would be to stack-allocate at initial frames of a Thread/Fiber, and
> use a new form of VarHandle that can be passed in calls or even somehow
> implicitly accessed via some form of "display" so they can be accessed
> by children (in the same or a nested Thread/Fiber)
> (see https://en.wikipedia.org/wiki/Call_stack#Lexically_nested_routines)

If I understand this correctly, we wouldn't have to map from a shared 
key object
to something stack-local, because we would only allow access through
something that is already stack-local.  However, I don't see how the inner
callee lookup could be directly associated with the outer binding. If 
each frame
was passed a hidden list of bound VarHandles, then it seems like lookup 
would
still need to search that list, though the list is probably short.

dl

> * Some updated form of RTSJ ScopedMemory regions.
>
> And ...
>
>> The fast path is 12 field loads, 5 conditional branches, and these are
>> dependent loads, so have a lot of latency. We also suffer a fair bit
>> from mispredicted branches, from the look of the profile.
> * It would be possible to create a variant of ThreadLocal that does not
> use WeakReferences, requiring explicit removal. This would reduce
> several loads.
>
> * Some usages might be able to tolerate a version providing only
> "static" ThreadLocals, that can use compile-time constants vs hashed keys.
>
> Short of these restrictions, despite the overhead, current ThreadLocals
> seem to be faster than any other general mechanism anyone has tried. But
> any further ideas for making them cheaper would be welcome.
>
> (Note also that we are sitting on some updates that will reduce garbage
> retention of ThreadLocals under some GCs at the expense of adding a
> "long" field per ThreadLocal. See
> http://cr.openjdk.java.net/~plevart/misc/JustMarkingReferenceQueue/webrev.04/)
>
> -Doug