On constant folding of final field loads

Sat Jun 27 01:27:08 UTC 2015

Hi there,

Recently I started looking at constant folding of loads from instance 
final fields:
   https://bugs.openjdk.java.net/browse/JDK-8058164

I made some progress and wanted to share my findings and initiate a 
discussion about the problems I spotted.

Current prototype:
   http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/hotspot
   http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/jdk

The idea is simple: JIT tracks final field changes and throws away 
nmethods which are affected.

There are 2 parts of the problem:
   - how to track changes of final field
   - how to manage relation between final fields and nmethods;

I. Tracking changes of final fields

There are 4 ways to circumvent runtime limitations and change a final 
field value:
   - Reflection API (Field.setAccessible())
   - Unsafe
   - JNI
   - java.lang.invoke (MethodHandles)

(It's also possible to write to a final field in a constructor, but I 
consider it as a corner case and haven't addressed yet. VM can ignore )

Since Reflection & java.lang.invoke APIs use Unsafe, it ends up with 
only 2 cases: JNI & Unsafe.

For JNI it's possible to encode field "finality" in jfieldID and check 
corresponding bit in Set*Field JNI methods before updating a field. 
There are already some data encoded in the field ID, so extending it to 
record final bit as well went pretty smooth.

For Unsafe it's much more complex.
I started with a similar approach (mostly implemented in the current 
prototype) - record "finality" bit in offset cookie and check it when 
performing a write.

Though Unsafe.objectFieldOffset/staticFieldOffset javadoc explicitly 
states that returned value is not guaranteed to be a byte offset [1], 
after following that road I don't see how offset encoding scheme can be 
changed.

First of all, there are Unsafe.get*Unaligned methods (added in 9), which 
require byte offset (Unsafe.getLong()):
   "Fetches a value at some byte offset into a given Java object
    ...
    The specification of this method is the same as {@link
    #getLong(Object, long)} except that the offset does not need to
    have been obtained from {@link #objectFieldOffset} on the
    {@link java.lang.reflect.Field} of some Java field."

Unsafe.getInt supports 3 addressing modes:
   (1) NULL + address
   (2) oop + offset
   (3) base + index * scale

Since there are no methods in Unsafe to get byte offsets, there's no way 
to make (3) work with non-byte offsets for Unaligned versions. Both base 
and scale should be either byte offsets or offset cookies to make things 
work. You can get a sense of the problems looking into Unsafe & java.nio 
hacks I did to make things somewhat function after switching offset 
encoding strategy.

Also, Unsafe.copyMemory() doesn't work well with offset cookies (see 
java.nio.Bits changes I did). Though source and destination addressing 
shares the same mode with Unsage.getInt() et al., the size of the copied 
region is defined in bytes. So, in order to perform bulk copies of 
consecutive memory blocks, the user should be able to convert offset 
cookie to byte offset and vice versa. There's no way to solve that with 
current API right now.

I don't want to touch compatibility concerns of switching from byte 
offsets to encoded offsets, but it looks like Unsafe API needs some 
overhaul in 9 to make offset encoding viable.

More realistically, since there are external dependencies on Unsafe API, 
I'd prefer to leave sun.misc.Unsafe as is and switch to VarHandles (when 
they are available in 9) all over JDK. Or temporarily make a private 
copy (finally :-)) of field accessors from Unsafe, switch it to encoded 
offsets, and use it in Reflection & java.lang.invoke API.

Regarding alternative approaches to track the finality, an offset bitmap 
on per-class basis can be used (containing locations of final fields). 
Possible downsides are: (1) memory footprint (1/8th of instance size per 
class); and (2) more complex checking logic (load a relevant piece of a 
bitmap from a klass, instead of checking locally available offset 
cookie). The advantage is that it is completely transparent to a user: 
it doesn't change offset translation scheme.

II. Managing relations between final fields and nmethods

Nmethods dependencies suits that purpose pretty well, but some 
enhancements are needed.

I envision 2 types of dependencies: (1) per-class (field holder); and 
(2) per-instance (value holder). Field holder is used as a context.

Unless a final field is changed, there's no need to track per-instance 
dependency. VM optimistically starts in per-class mode and switch to 
per-instance mode when it sees a field change. The policy is chosen on 
per-class basis. VM should be pretty conservative, since false positives 
are expensive - a change of unrelated field causes recompilation of all 
nmethods where the same field was inlined (even if the value was taken 
from a different instance). Aliasing also causes false positives (same 
instance, but different final field), so fields in the same class should 
be differentiated as well.

Unilke methods, fields don't have any dedicated metadata associated with 
them. All data is confined in holder klass. To be able to identify a 
field in a dependency, byte offset can be used. Right now, dependency 
management machinery supports only oops and metadata. So, it should be 
extended to support primitive values in dependencies (haven't done yet).

Byte offset + per-instance modes completely eliminates false positives.

Another aspect is how expensive dependency checking becomes.

I took a benchmark from Nashorn/Octane (Box2D), since MethodHandle 
inlining heavily relies on constant folding of instance final fields.

                     Before   After
checks (#)          420       12,5K
nmethods checked(#)  3K       1,5M
total time:         60ms       2s
deps total          19K        26K

Though total number of dependencies in VM didn't change much (+37% = 
19K->26K), total number of checked dependencies (500x: 3K -> 1,5M) and 
time spent on dependency checking (30x: 60ms -> 2s) dramatically increased.

The reason is that constant field value dependencies created heavily 
populated contextes which are regularly checked:

      #1                #2    #3/#4
Before
   KlassDep            254    47/2,632
   CallSiteDep         167    46/  358

After
   ConstantFieldDep 11,790     0/1,494,112
   KlassDep            286    41/    2,769
   CallSiteDep         249    58/      393

(#1 - dependency kind; #2 - total number of unique dependencies;
#3/#4 - invalidated nmethods/checked dependencies)

I have 3 ideas how to improve performance of dependency checking:

   (1) split dependency context list (nmethodBucket) into 3 independent 
lists (Klass, CallSite & ConstantValue); (IMPLEMENTED)

It trades size for speed - duplicate nmethods are possible, but the 
lists should be shorter on average. I already implemented it, but it 
didn't improve the benchmark I'm playing with, since the fraction of 
CallSite/Klass deps is very small compared to ConstantField.

   (2) group nmethodBucket entries into chunks of k-nmethods; (TODO)

It should improve nmethod iteration speed in heavily populated contexts.

   (3) iterate only dependencies of appropriate kind; (TODO)

There are 3 kinds of changes which require dependency checking: changes 
in CHA (KlassDepChange), call site target change (CallSiteDepChange), 
and constant field value change (ConstantFieldDepChange). Different 
types of changes affect disjoint sets of dependencies. So, instead of 
enumerating all dependencies in a nmethod, a more focused approach can 
be used (e.g. check only call_site_target_value deps for 
CallSiteDepChange).

Since dependencies are sorted by type when serialized in a nmethod, it's 
possible to compute offsets for 3 disjoint sets and use them in 
DepStream to iterate only relevant dependencies.

I hope it'll significantly reduce dependency checking costs I'm seeing.

That's all for now. Thanks!

Best regards,
Vladimir Ivanov

[1] "Do not expect to perform any sort of arithmetic on this offset;
      it is just a cookie which is passed to the unsafe heap memory
      accessors."