On constant folding of final field loads

Mon Jul 20 23:05:07 UTC 2015

On Jun 30, 2015, at 12:00 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Aleksey,
> 
>>>> Big picture question: do we actually care about propagating final field
>>>> values once the object escaped (and in this sense, available to be
>>>> introspected by the compiler)?
>>>> 
>>>> Java memory model does not guarantee the final field visibility when the
>>>> object had escaped. The very reason why deserialization works is because
>>>> the deserialized object had not yet been published.
>>>> 
>>>> That is, are we in line with the spec and general expectations by
>>>> folding the final values, *and* not deoptimizing on the store?
>>> Can you elaborate on your point and interaction with JMM a bit?
>>> 
>>> Are you talking about not tracking constant folded final field values at
>>> all, since there are no guarantees by JMM such updates are visible?
>> 
>> Yup. AFAIU the JMM, there is no guarantees you would see the updated
>> value for final field after the object had leaked. So, spec-wise you may
>> just use the final field values as constants. I think the only reason
>> you have to do the dependency tracking is when constant folding depends
>> on instance identity.
>> 
>> So, my question is, do we knowingly make a goodwill call to deopt on
>> final field store, even though it is not required by spec? I am not
>> opposing the change, but I'd like us to understand the implications better.
> That's a good question.

I believe that the JMM doesn't give users any hope for changing the
value of a final field, apart from objects under initialization, and
the specific cases of System.setErr and its two evil twins.  (Did I
miss a third case?  There aren't many.)

Rather than wait for the unthinkable and throw a de-opt, I would
prefer to make more positive checks against final field changing, and
throw a suitable exception when (if ever) an application sets a final
field not in a scenario envisioned by the JMM.  The value of this
would be that whatever context markers or annotations we use to make
these checks will also help guide the *suppression* of the final field
folding optimization.

> I consider it more like a quality of implementation aspect. Neither Reflection nor Unsafe APIs are part of JVM/JLS spec, so I don't think possibility of final field updates should be taken into account there.

Reflection allows apps. to emulate the source semantics of Java
programs, and (independently) it provides access to some run-time
metadata.  Whatever it does with final should correspond (within
reason) to source semantics.  Unsafe is whatever we want it to be, as
a simple, well-factored set of building blocks to implement low-level
JVM operations and (independently) provide access to some run-time
features of the hardware platform.  Therefore, Unsafe and Reflection
are partially coupled to final semantics.

With that said, I think it may be undesirable to push final-bit
checking into the Unsafe API.  Unsafe loads and stores should map to
single memory instructions (with doubly-indexed, unscaled addresses).
If we add extra "tag" bits to (say) offsets, we will have to "untag"
those offsets when the instruction executes (if the offsets are not
JIT-time constants); that is an extra instruction.

> In order to avoid surprises and inconsistencies (old value vs new value depending on execution path) which are *very* hard to track down, VM should either completely forbid final field changes or keep track of them and adapt accordingly.

I like the "forbid" option, also known as "fail fast".  I think (in general)
we should (where we can) remove indeterminate behavior from the
JVM specification, such as "what happens when I store a new value
to a final at an unexpected time".

We have enough bits in the object header to encode frozen-ness.
This is an opposite property:  slushiness-of-finals.  We could require
that the newInstance operation used by deserialization would create
slushy objects.  (The normal new/<init> sequence doesn't need this.)
Ideally, we would want the deserializer to issue an explicit "publish"
operation, which would clear the slushy flag.  JITs would consult
that flag to gate final-folding.  Reflection (and other users of
Unsafe) would consult the flag and throw a fail-fast error if it
failed.  There would have to be some way to limit the time
an object is in the slushy state, ideally by enforcing an error
on deserializers who neglect to publish (or discard) a slushy
object.  For example, we could require an annotation on
deserialization methods, as we do today on caller-sensitive
methods.

That's the sort of thing I would prefer to see, to remove
indeterminate behavior.

> 
>> For example, I can see the change gives rise to some interesting
>> low-level coding idioms, like:
>> 
>> final boolean running = true;
>> Field runningField = resolve(...); // reflective
>> 
>> // run stuff for minutes
>> void m() {
>>   while (running) { // compiler hoists, turns into while(true)
>>      // do stuff
>>   }
>> }
>> 
>> void hammerTime() {
>>   runningField.set(this, false); // deopt, break the loop!
>> }
>> 
>> Once we allow users to go crazy like that, it would be cruel to
>> retract/break/change this behavior.

You can simulate this (very interesting) pattern using the "target" variable
of a MutableCallSite.  I.e., the "fold then deopt" use case is supported
by MutableCallSite.setTarget.

Those variable semantics should *not* be overloaded on final.
That pattern, if driven by a special variable, deserves a new
kind of variable.

The key parameters would be 1) allowed state transitions between
blank, set, reset, dead, and 2) expected frequency of various
transitions.  The frequencies are guesses, not user contracts,
and the JVM would have to measure and retune to cope with
surprises.

(One thing I wonder about:  What could a "volatile final" be?
My best suggestion is a moderate extension of blank finals:
  http://cr.openjdk.java.net/~jrose/draft/lazy-final.html )

>> But I speculate those cases are not pervasive. By and large, people care
>> about final ops to jump through the barriers. For example, the final
>> load can be commonned through the acquires / control flow. See e.g.:
>>  http://psy-lob-saw.blogspot.ru/2014/02/when-i-say-final-i-mean-final.html
> >
>>>>> Regarding alternative approaches to track the finality, an offset bitmap
>>>>> on per-class basis can be used (containing locations of final fields).
>>>>> Possible downsides are: (1) memory footprint (1/8th of instance size per
>>>>> class); and (2) more complex checking logic (load a relevant piece of a
>>>>> bitmap from a klass, instead of checking locally available offset
>>>>> cookie). The advantage is that it is completely transparent to a user:
>>>>> it doesn't change offset translation scheme.
>>>> 
>>>> I like this one. Paying with slightly larger memory footprint for API
>>>> compatibility sounds reasonable to me.
>>> 
>>> I don't care about cases when Unsafe API is abused (e.g. raw memory
>>> writes on absolute address or arbitrary offset in an object). In the
>>> end, it's unsafe API, right? :-)

Today's abuse = tomorrow's use.  Whatever we might want to do with
a memory instruction is a possible valid use for Unsafe.  For Project
Panama I expect we will be using the managed heap to store temporary
native values.  The envelope will be something like a new long[2],
but the layout (after the envelope header = array base) will *not*
be something the JVM knows about in detail; it will be sliced up
by Unsafe operations into native bits and bytes.  And likewise with
malloc-buffers (where the whole VA is stuffed in the offset).

>> Yeah, but with millions of users, we are in a bit of a (implicit)
>> compatibility bind here ;)
> 
> That's why I deliberately tried to omit compatibility aspect discussion for now :-)
> 
> Unsafe is unique: it's not a supported API, but nonetheless many people rely on it. It means we can't throw it away (even in a major release), but still we are not as limited as with official public API.
> 
> As part of Project Jigsaw there's already an attempt to do an incompatible change for Unsafe API. Depending on how it goes, we can get some insights how to address compatibility concerns (e.g. preserve original behavior in Java 8 compatibility mode).
> 
> What I'm trying to understand right now, before diving into compatibility details, is whether Unsafe API allows offset encoding scheme change itself and what can be done to make it happen.

The decision to make offsets opaque was mine; the idea was to hide more
details of object layout, for example in case the JVM ever used non-flat
layouts for objects.  (It never has.  It might for objects containing larger
value types; we don't know yet.)  At least some offsets want to be
occur in arithmetic sequences (arrays and now misaligned accesses).

Of course 64 bits can encode a lot of stuff, so it would be possible to mix
together both symbolic information (type tags, finality and other mode tags)
with pure offset or address information.  Over 32 bits of arithmetic sequence
range can co-exist with this, by putting the tags at either end of the word.
But (back to my earlier comment) this makes it hard to compile Unsafe ops
as single instructions.

Folding tags into offsets will make it harder for Panama-type APIs to perform
address arithmetic (they will have to work around the tags).  The Unsafe
API would have to expose operations like offsetAdd(long o, int delta) and
offsetDifference(long o1, long o2).

> Though offset value is explicitly described in API as an opaque offset cookie, I spotted 2 inconsistencies in the API itself:
> 
>  * Unsafe.get/set*Unaligned() require absolute offsets;
> These methods were added in 9, so haven't leaked into public yet.

Yep.  That seems to push for a high-tag (color bits in the MSB of
the offset), or (my preference) no tag or separate tag.
You could also copy the alignment bits into the LSB to co-exist
with a tag.

(The "separate tag" option means something like having a
query for the "tag" as well as the base and offset of a variable.
The operations getInt, etc., would take an optional third argument,
which would be the tag associated with the base and offset.
This would allow address arithmetic to remain trivial, at
the expense of retooling uses of Unsafe that need to be
sensitive to tagging concerns.)

> Andrew, can you comment on why you decided to stick with absolute offsets and not preserving Unsafe.getInt() addressing scheme?

(The outcome is that the unaligned guys have the same signatures as the aligned ones.)

>  * Unsafe.copyMemory()
> Source and destination addressing operate on offset cookies, but amount of copied data is expressed in bytes. In order to do bulk copies of consecutive memory blocks, the user should be able to convert offset cookies to byte offset and vice versa. There's no way to do that with current API.

Right.

> Are you aware of any other use cases when people rely on absolute offsets?
> 
> I thought about VarHandles a bit and it seems they aren't a silver bullet - they should be based on Unsafe (or stripped Unsafe equivalent) anyway.
> 
> Unsafe.fireDepChange is a viable option for Reflection and MethodHandles. I'll consider it during further explorations. The downside is that it puts responsibility of tracking final field changes on a user, which is error-prone. There are places in JDK where Unsafe is used directly and they should be analyzed whether a final field is updated or not on a case-by-case basis.

Idea:  If we go with a three-argument version of getInt, the legacy two-argument
version could do a more laborious check.  The best way to motivate users of
Unsafe to refresh their code (probably) is to improve performance.  Recovering
lost performance (due to increased safety) is a tactic we can use too, although
it is less enjoyable all around.

I wish we had value types already; we could make a lot of this clearer if we
were able to give cookies their own opaque 64-bit type.

(Hacky idea:  Use "double" as an envelope type for a second kind of cookie,
since "long" is taken.  Hacky idea killer:  There is an implicit conversion from
long to double, which is probably harmful.)
> 
> It's basically opt-in vs opt-out approaches. I'd prefer a cleaner approach, if there's a solution for compatibility issues.
> 
>>> So, my next question is how to proceed. Does changing API and providing
>>> 2 set of functions working with absolute and encoded offsets solve the
>>> problem? Or leaving Unsafe as is (but clarifying the API) and migrating
>>> Reflection/j.l.i to VarHandles solve the problem? That's what I'm trying
>>> to understand.
>> 
>> I would think Reflection/j.l.i would eventually migrate to VarHandles
>> anyway. Paul? The interim solution for encoding final field flags
>> shouldn't leak into (even Unsafe) API, or at least should not break the
>> existing APIs.
>> 
>> I further think that an interim solution makes auxiliary single
>> Unsafe.fireDepChange(Field f / long addr) or something, and uses it
>> along with the Unsafe calls in Reflection/j.l.i, when wrappers know they
>> are dealing with final fields. In other words, should we try to reuse
>> the knowledge those wrappers already have, instead of trying to encode
>> the same knowledge into offset cookies?
> >
>>>>> II. Managing relations between final fields and nmethods
>>>>> Another aspect is how expensive dependency checking becomes.
>> 
>>>> Isn't the underlying problem being the dependencies are searched
>>>> linearly? At least in ConstantFieldDep, can we compartmentalize the
>>>> dependencies by holder class in some sort of hash table?
>>> In some cases (when coarse-grained (per-class) tracking is used), linear
>>> traversal is fine, since all nmethods will be invalidated.
>>> 
>>> In order to construct a more efficient data structure, you need a way to
>>> order or hash oops. The problem with that is oops aren't stable - they
>>> can change at any GC. So, either some stable value should be associated
>>> with them (System.identityHashCode()?) or dependency tables should be
>>> updated on every GC.
>> 
>> Yeah, like Symbol::_identity_hash.
> Symbol is an internal VM entity. Oops are different. They are just pointers to Java object (OOP = Ordinary Object Pointer). The only doable way is piggyback on object hash code. I won't dive into details here, but there are many intricate consequences.

We sometimes use binary search instead of identity_hash, e.g., in the CI.
We could create a data structure which carries a GC generation counter,
and re-sort lazily as needed.  In principle, the GC could help with re-sorting.
The API could look like a fixed-sized table containing a pair of aligned arrays:
Object[] key, int[] value.  The arrays would have to be encapsulated, since
they can change order at any moment (if GC kicks in), but it would be
reasonable to work with snapshots of them for bulk queries.  The
underlying arrays could be ordinary Java arrays, perhaps with blocking
to facilitate growth. Native methods or specially-marked non-interruptable
methods would perform the required transactions.  Access cost would be
O(log(N)).  This feels like it might be useful for things besides dependencies.

>>> Unless existing machinery can be sped up to appropriate level, I
>>> wouldn't consider complicating things so much.
>> 
>> Okay. I just can't escape the feeling we keep band-aiding the linear
>> searches everywhere in VM on case-to-case basis, instead of providing
>> the asymptotic guarantees with better data structures.
> Well, class-based dependency contexts have been working pretty well for KlassDeps. They worked pretty well for CallSiteDeps as well, once a more specific context was used (I introduced a specialized CallSite instance-based implementation because it is simpler to maintain).
> 
> It's hard to come up with a narrow enough class context for ConstantFieldDeps, so, probably, it's a good time to consider a different approach to index nmethod dependencies. But assuming final field updates are rare (with the exception of deserialization), it can be not that important.
> 
>>> The 3 optimizations I initially proposed allow to isolate
>>> ConstantFieldDep from other kinds of dependencies, so dependency
>>> traversal speed will affect only final field writes. Which is acceptable
>>> IMO.
>> 
>> Except for an overwhelming number of cases where the final field stores
>> happen in the course of deserialization. What's particularly bad about
>> this scenario is that you wouldn't see the time burned in the VM unless
>> you employ the native profiler, as we discovered in Nashorn perf work.
> Yes, deserialization is a good example. It's special because it operates on freshly created objects, which, as you noted, haven't escaped yet. It'd be nice if VM can skip dependency checking in such case (either automatically or with explicit hints).

Agree.  The "slushy bit" might help.  Hard part:  It would have to co-exist
with identityHashCode (because deserialization uses that also IIRC).

— John

> In order to diagnose performance problems with excessive dependency checking, VM can monitor it closely (UsePerfData counters + JFR events + tracing should provide enough information to spot issues).
> 
>> Recapping the discussion in this thread, I think we would need to have a
>> more thorough performance work for this change, since it touches the
>> very core of the platform. I think many people outside the
>> hotspot-compiler-dev understand some corner intricacies of the problem
>> that we miss. JEP and outcry for public comments, maybe?
> Yes, I planned to get quick feedback on the list and then file a JEP as a followup.
> 
> Thanks again for the feedback, Aleksey!
> 
> Best regards,
> Vladimir Ivanov