On constant folding of final field loads

Mon Jun 29 13:10:42 UTC 2015

Aleksey,

Thanks a lot for the feedback!

See my answers inline.

On 6/29/15 1:35 PM, Aleksey Shipilev wrote:
> Hi,
>
> On 06/27/2015 04:27 AM, Vladimir Ivanov wrote:
>> Current prototype:
>>    http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/hotspot
>>    http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/jdk
>>
>> The idea is simple: JIT tracks final field changes and throws away
>> nmethods which are affected.
>
> Big picture question: do we actually care about propagating final field
> values once the object escaped (and in this sense, available to be
> introspected by the compiler)?
>
> Java memory model does not guarantee the final field visibility when the
> object had escaped. The very reason why deserialization works is because
> the deserialized object had not yet been published.
>
> That is, are we in line with the spec and general expectations by
> folding the final values, *and* not deoptimizing on the store?
Can you elaborate on your point and interaction with JMM a bit?

Are you talking about not tracking constant folded final field values at 
all, since there are no guarantees by JMM such updates are visible?

>> Though Unsafe.objectFieldOffset/staticFieldOffset javadoc explicitly
>> states that returned value is not guaranteed to be a byte offset [1],
>> after following that road I don't see how offset encoding scheme can be
>> changed.
>
> Yes. Lots and lots of users rely on *fieldOffset to return the actual
> byte offset, even though it is not specified as such. This understanding
> is so prevalent, that it leaks into Unsafe.get*Unaligned, etc.
>
>
>> More realistically, since there are external dependencies on Unsafe API,
>> I'd prefer to leave sun.misc.Unsafe as is and switch to VarHandles (when
>> they are available in 9) all over JDK. Or temporarily make a private
>> copy (finally :-)) of field accessors from Unsafe, switch it to encoded
>> offsets, and use it in Reflection & java.lang.invoke API.
>
> Or, introduce Unsafe.invalidateFinalDep(Field/offset/etc), and add the
> call to it to Reflection accessors, MethodHandles invoke, VarHandle
> handles, etc. When/if Unsafe goes away, so do the unsafe
> (non-dependency-firing) final field stores. Raw memory access via Unsafe
> already escapes whatever traps you are setting in (oop + offset) path,
> so it would be nice to have the option to fire the dependency check for
> an arbitrary (?) offset.
>
>
>> Regarding alternative approaches to track the finality, an offset bitmap
>> on per-class basis can be used (containing locations of final fields).
>> Possible downsides are: (1) memory footprint (1/8th of instance size per
>> class); and (2) more complex checking logic (load a relevant piece of a
>> bitmap from a klass, instead of checking locally available offset
>> cookie). The advantage is that it is completely transparent to a user:
>> it doesn't change offset translation scheme.
>
> I like this one. Paying with slightly larger memory footprint for API
> compatibility sounds reasonable to me.

I don't care about cases when Unsafe API is abused (e.g. raw memory 
writes on absolute address or arbitrary offset in an object). In the 
end, it's unsafe API, right? :-)

What I want to cover is proper usages of Unsafe API to access 
instance/static fields. That's the part which is used in Reflection & 
java.lang.invoke API. Unsafe is used there to bypass access checks.

It doesn't mean I'm fine with breaking existing user code. But since 
Unsafe is not a supported API, I admit some limited changes in major 
release (e.g. 9) are allowed. What I'm trying to understand is to what 
extent it can be changed.

My experiments show that simply changing offset encoding strategy 
doesn't work. There are cases when absolute offsets are needed.

So, my next question is how to proceed. Does changing API and providing 
2 set of functions working with absolute and encoded offsets solve the 
problem? Or leaving Unsafe as is (but clarifying the API) and migrating 
Reflection/j.l.i to VarHandles solve the problem? That's what I'm trying 
to understand.

>
>> II. Managing relations between final fields and nmethods
>> Another aspect is how expensive dependency checking becomes.
>>
>> I took a benchmark from Nashorn/Octane (Box2D), since MethodHandle
>> inlining heavily relies on constant folding of instance final fields.
>>
>>                      Before   After
>> checks (#)          420       12,5K
>> nmethods checked(#)  3K       1,5M
>> total time:         60ms       2s
>> deps total          19K        26K
>>
>> Though total number of dependencies in VM didn't change much (+37% =
>> 19K->26K), total number of checked dependencies (500x: 3K -> 1,5M) and
>> time spent on dependency checking (30x: 60ms -> 2s) dramatically increased.
>>
>> The reason is that constant field value dependencies created heavily
>> populated contextes which are regularly checked:
>>
>>       #1                #2    #3/#4
>> Before
>>    KlassDep            254    47/2,632
>>    CallSiteDep         167    46/  358
>>
>> After
>>    ConstantFieldDep 11,790     0/1,494,112
>>    KlassDep            286    41/    2,769
>>    CallSiteDep         249    58/      393
>>
>> (#1 - dependency kind; #2 - total number of unique dependencies;
>> #3/#4 - invalidated nmethods/checked dependencies)
>
> Isn't the underlying problem being the dependencies are searched
> linearly? At least in ConstantFieldDep, can we compartmentalize the
> dependencies by holder class in some sort of hash table?
In some cases (when coarse-grained (per-class) tracking is used), linear 
traversal is fine, since all nmethods will be invalidated.

In order to construct a more efficient data structure, you need a way to 
order or hash oops. The problem with that is oops aren't stable - they 
can change at any GC. So, either some stable value should be associated 
with them (System.identityHashCode()?) or dependency tables should be 
updated on every GC.

Unless existing machinery can be sped up to appropriate level, I 
wouldn't consider complicating things so much.

The 3 optimizations I initially proposed allow to isolate 
ConstantFieldDep from other kinds of dependencies, so dependency 
traversal speed will affect only final field writes. Which is acceptable 
IMO.

Best regards,
Vladimir Ivanov