On constant folding of final field loads

Tue Jun 30 19:00:42 UTC 2015

Aleksey,

>>> Big picture question: do we actually care about propagating final field
>>> values once the object escaped (and in this sense, available to be
>>> introspected by the compiler)?
>>>
>>> Java memory model does not guarantee the final field visibility when the
>>> object had escaped. The very reason why deserialization works is because
>>> the deserialized object had not yet been published.
>>>
>>> That is, are we in line with the spec and general expectations by
>>> folding the final values, *and* not deoptimizing on the store?
>> Can you elaborate on your point and interaction with JMM a bit?
>>
>> Are you talking about not tracking constant folded final field values at
>> all, since there are no guarantees by JMM such updates are visible?
>
> Yup. AFAIU the JMM, there is no guarantees you would see the updated
> value for final field after the object had leaked. So, spec-wise you may
> just use the final field values as constants. I think the only reason
> you have to do the dependency tracking is when constant folding depends
> on instance identity.
>
> So, my question is, do we knowingly make a goodwill call to deopt on
> final field store, even though it is not required by spec? I am not
> opposing the change, but I'd like us to understand the implications better.
That's a good question.

I consider it more like a quality of implementation aspect. Neither 
Reflection nor Unsafe APIs are part of JVM/JLS spec, so I don't think 
possibility of final field updates should be taken into account there.

In order to avoid surprises and inconsistencies (old value vs new value 
depending on execution path) which are *very* hard to track down, VM 
should either completely forbid final field changes or keep track of 
them and adapt accordingly.

> For example, I can see the change gives rise to some interesting
> low-level coding idioms, like:
>
> final boolean running = true;
> Field runningField = resolve(...); // reflective
>
> // run stuff for minutes
> void m() {
>    while (running) { // compiler hoists, turns into while(true)
>       // do stuff
>    }
> }
>
> void hammerTime() {
>    runningField.set(this, false); // deopt, break the loop!
> }
>
> Once we allow users to go crazy like that, it would be cruel to
> retract/break/change this behavior.
>
> But I speculate those cases are not pervasive. By and large, people care
> about final ops to jump through the barriers. For example, the final
> load can be commonned through the acquires / control flow. See e.g.:
>   http://psy-lob-saw.blogspot.ru/2014/02/when-i-say-final-i-mean-final.html
 >
>>>> Regarding alternative approaches to track the finality, an offset bitmap
>>>> on per-class basis can be used (containing locations of final fields).
>>>> Possible downsides are: (1) memory footprint (1/8th of instance size per
>>>> class); and (2) more complex checking logic (load a relevant piece of a
>>>> bitmap from a klass, instead of checking locally available offset
>>>> cookie). The advantage is that it is completely transparent to a user:
>>>> it doesn't change offset translation scheme.
>>>
>>> I like this one. Paying with slightly larger memory footprint for API
>>> compatibility sounds reasonable to me.
>>
>> I don't care about cases when Unsafe API is abused (e.g. raw memory
>> writes on absolute address or arbitrary offset in an object). In the
>> end, it's unsafe API, right? :-)
>
> Yeah, but with millions of users, we are in a bit of a (implicit)
> compatibility bind here ;)

That's why I deliberately tried to omit compatibility aspect discussion 
for now :-)

Unsafe is unique: it's not a supported API, but nonetheless many people 
rely on it. It means we can't throw it away (even in a major release), 
but still we are not as limited as with official public API.

As part of Project Jigsaw there's already an attempt to do an 
incompatible change for Unsafe API. Depending on how it goes, we can get 
some insights how to address compatibility concerns (e.g. preserve 
original behavior in Java 8 compatibility mode).

What I'm trying to understand right now, before diving into 
compatibility details, is whether Unsafe API allows offset encoding 
scheme change itself and what can be done to make it happen.

Though offset value is explicitly described in API as an opaque offset 
cookie, I spotted 2 inconsistencies in the API itself:

   * Unsafe.get/set*Unaligned() require absolute offsets;
These methods were added in 9, so haven't leaked into public yet.

Andrew, can you comment on why you decided to stick with absolute 
offsets and not preserving Unsafe.getInt() addressing scheme?

   * Unsafe.copyMemory()
Source and destination addressing operate on offset cookies, but amount 
of copied data is expressed in bytes. In order to do bulk copies of 
consecutive memory blocks, the user should be able to convert offset 
cookies to byte offset and vice versa. There's no way to do that with 
current API.

Are you aware of any other use cases when people rely on absolute offsets?

I thought about VarHandles a bit and it seems they aren't a silver 
bullet - they should be based on Unsafe (or stripped Unsafe equivalent) 
anyway.

Unsafe.fireDepChange is a viable option for Reflection and 
MethodHandles. I'll consider it during further explorations. The 
downside is that it puts responsibility of tracking final field changes 
on a user, which is error-prone. There are places in JDK where Unsafe is 
used directly and they should be analyzed whether a final field is 
updated or not on a case-by-case basis.

It's basically opt-in vs opt-out approaches. I'd prefer a cleaner 
approach, if there's a solution for compatibility issues.

>> So, my next question is how to proceed. Does changing API and providing
>> 2 set of functions working with absolute and encoded offsets solve the
>> problem? Or leaving Unsafe as is (but clarifying the API) and migrating
>> Reflection/j.l.i to VarHandles solve the problem? That's what I'm trying
>> to understand.
>
> I would think Reflection/j.l.i would eventually migrate to VarHandles
> anyway. Paul? The interim solution for encoding final field flags
> shouldn't leak into (even Unsafe) API, or at least should not break the
> existing APIs.
>
> I further think that an interim solution makes auxiliary single
> Unsafe.fireDepChange(Field f / long addr) or something, and uses it
> along with the Unsafe calls in Reflection/j.l.i, when wrappers know they
> are dealing with final fields. In other words, should we try to reuse
> the knowledge those wrappers already have, instead of trying to encode
> the same knowledge into offset cookies?
 >
>>>> II. Managing relations between final fields and nmethods
>>>> Another aspect is how expensive dependency checking becomes.
>
>>> Isn't the underlying problem being the dependencies are searched
>>> linearly? At least in ConstantFieldDep, can we compartmentalize the
>>> dependencies by holder class in some sort of hash table?
>> In some cases (when coarse-grained (per-class) tracking is used), linear
>> traversal is fine, since all nmethods will be invalidated.
>>
>> In order to construct a more efficient data structure, you need a way to
>> order or hash oops. The problem with that is oops aren't stable - they
>> can change at any GC. So, either some stable value should be associated
>> with them (System.identityHashCode()?) or dependency tables should be
>> updated on every GC.
>
> Yeah, like Symbol::_identity_hash.
Symbol is an internal VM entity. Oops are different. They are just 
pointers to Java object (OOP = Ordinary Object Pointer). The only doable 
way is piggyback on object hash code. I won't dive into details here, 
but there are many intricate consequences.

>> Unless existing machinery can be sped up to appropriate level, I
>> wouldn't consider complicating things so much.
>
> Okay. I just can't escape the feeling we keep band-aiding the linear
> searches everywhere in VM on case-to-case basis, instead of providing
> the asymptotic guarantees with better data structures.
Well, class-based dependency contexts have been working pretty well for 
KlassDeps. They worked pretty well for CallSiteDeps as well, once a more 
specific context was used (I introduced a specialized CallSite 
instance-based implementation because it is simpler to maintain).

It's hard to come up with a narrow enough class context for 
ConstantFieldDeps, so, probably, it's a good time to consider a 
different approach to index nmethod dependencies. But assuming final 
field updates are rare (with the exception of deserialization), it can 
be not that important.

>> The 3 optimizations I initially proposed allow to isolate
>> ConstantFieldDep from other kinds of dependencies, so dependency
>> traversal speed will affect only final field writes. Which is acceptable
>> IMO.
>
> Except for an overwhelming number of cases where the final field stores
> happen in the course of deserialization. What's particularly bad about
> this scenario is that you wouldn't see the time burned in the VM unless
> you employ the native profiler, as we discovered in Nashorn perf work.
Yes, deserialization is a good example. It's special because it operates 
on freshly created objects, which, as you noted, haven't escaped yet. 
It'd be nice if VM can skip dependency checking in such case (either 
automatically or with explicit hints).

In order to diagnose performance problems with excessive dependency 
checking, VM can monitor it closely (UsePerfData counters + JFR events + 
tracing should provide enough information to spot issues).

> Recapping the discussion in this thread, I think we would need to have a
> more thorough performance work for this change, since it touches the
> very core of the platform. I think many people outside the
> hotspot-compiler-dev understand some corner intricacies of the problem
> that we miss. JEP and outcry for public comments, maybe?
Yes, I planned to get quick feedback on the list and then file a JEP as 
a followup.

Thanks again for the feedback, Aleksey!

Best regards,
Vladimir Ivanov