On constant folding of final field loads
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Sat Jun 27 01:27:08 UTC 2015
Hi there,
Recently I started looking at constant folding of loads from instance
final fields:
https://bugs.openjdk.java.net/browse/JDK-8058164
I made some progress and wanted to share my findings and initiate a
discussion about the problems I spotted.
Current prototype:
http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/hotspot
http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/jdk
The idea is simple: JIT tracks final field changes and throws away
nmethods which are affected.
There are 2 parts of the problem:
- how to track changes of final field
- how to manage relation between final fields and nmethods;
I. Tracking changes of final fields
There are 4 ways to circumvent runtime limitations and change a final
field value:
- Reflection API (Field.setAccessible())
- Unsafe
- JNI
- java.lang.invoke (MethodHandles)
(It's also possible to write to a final field in a constructor, but I
consider it as a corner case and haven't addressed yet. VM can ignore )
Since Reflection & java.lang.invoke APIs use Unsafe, it ends up with
only 2 cases: JNI & Unsafe.
For JNI it's possible to encode field "finality" in jfieldID and check
corresponding bit in Set*Field JNI methods before updating a field.
There are already some data encoded in the field ID, so extending it to
record final bit as well went pretty smooth.
For Unsafe it's much more complex.
I started with a similar approach (mostly implemented in the current
prototype) - record "finality" bit in offset cookie and check it when
performing a write.
Though Unsafe.objectFieldOffset/staticFieldOffset javadoc explicitly
states that returned value is not guaranteed to be a byte offset [1],
after following that road I don't see how offset encoding scheme can be
changed.
First of all, there are Unsafe.get*Unaligned methods (added in 9), which
require byte offset (Unsafe.getLong()):
"Fetches a value at some byte offset into a given Java object
...
The specification of this method is the same as {@link
#getLong(Object, long)} except that the offset does not need to
have been obtained from {@link #objectFieldOffset} on the
{@link java.lang.reflect.Field} of some Java field."
Unsafe.getInt supports 3 addressing modes:
(1) NULL + address
(2) oop + offset
(3) base + index * scale
Since there are no methods in Unsafe to get byte offsets, there's no way
to make (3) work with non-byte offsets for Unaligned versions. Both base
and scale should be either byte offsets or offset cookies to make things
work. You can get a sense of the problems looking into Unsafe & java.nio
hacks I did to make things somewhat function after switching offset
encoding strategy.
Also, Unsafe.copyMemory() doesn't work well with offset cookies (see
java.nio.Bits changes I did). Though source and destination addressing
shares the same mode with Unsage.getInt() et al., the size of the copied
region is defined in bytes. So, in order to perform bulk copies of
consecutive memory blocks, the user should be able to convert offset
cookie to byte offset and vice versa. There's no way to solve that with
current API right now.
I don't want to touch compatibility concerns of switching from byte
offsets to encoded offsets, but it looks like Unsafe API needs some
overhaul in 9 to make offset encoding viable.
More realistically, since there are external dependencies on Unsafe API,
I'd prefer to leave sun.misc.Unsafe as is and switch to VarHandles (when
they are available in 9) all over JDK. Or temporarily make a private
copy (finally :-)) of field accessors from Unsafe, switch it to encoded
offsets, and use it in Reflection & java.lang.invoke API.
Regarding alternative approaches to track the finality, an offset bitmap
on per-class basis can be used (containing locations of final fields).
Possible downsides are: (1) memory footprint (1/8th of instance size per
class); and (2) more complex checking logic (load a relevant piece of a
bitmap from a klass, instead of checking locally available offset
cookie). The advantage is that it is completely transparent to a user:
it doesn't change offset translation scheme.
II. Managing relations between final fields and nmethods
Nmethods dependencies suits that purpose pretty well, but some
enhancements are needed.
I envision 2 types of dependencies: (1) per-class (field holder); and
(2) per-instance (value holder). Field holder is used as a context.
Unless a final field is changed, there's no need to track per-instance
dependency. VM optimistically starts in per-class mode and switch to
per-instance mode when it sees a field change. The policy is chosen on
per-class basis. VM should be pretty conservative, since false positives
are expensive - a change of unrelated field causes recompilation of all
nmethods where the same field was inlined (even if the value was taken
from a different instance). Aliasing also causes false positives (same
instance, but different final field), so fields in the same class should
be differentiated as well.
Unilke methods, fields don't have any dedicated metadata associated with
them. All data is confined in holder klass. To be able to identify a
field in a dependency, byte offset can be used. Right now, dependency
management machinery supports only oops and metadata. So, it should be
extended to support primitive values in dependencies (haven't done yet).
Byte offset + per-instance modes completely eliminates false positives.
Another aspect is how expensive dependency checking becomes.
I took a benchmark from Nashorn/Octane (Box2D), since MethodHandle
inlining heavily relies on constant folding of instance final fields.
Before After
checks (#) 420 12,5K
nmethods checked(#) 3K 1,5M
total time: 60ms 2s
deps total 19K 26K
Though total number of dependencies in VM didn't change much (+37% =
19K->26K), total number of checked dependencies (500x: 3K -> 1,5M) and
time spent on dependency checking (30x: 60ms -> 2s) dramatically increased.
The reason is that constant field value dependencies created heavily
populated contextes which are regularly checked:
#1 #2 #3/#4
Before
KlassDep 254 47/2,632
CallSiteDep 167 46/ 358
After
ConstantFieldDep 11,790 0/1,494,112
KlassDep 286 41/ 2,769
CallSiteDep 249 58/ 393
(#1 - dependency kind; #2 - total number of unique dependencies;
#3/#4 - invalidated nmethods/checked dependencies)
I have 3 ideas how to improve performance of dependency checking:
(1) split dependency context list (nmethodBucket) into 3 independent
lists (Klass, CallSite & ConstantValue); (IMPLEMENTED)
It trades size for speed - duplicate nmethods are possible, but the
lists should be shorter on average. I already implemented it, but it
didn't improve the benchmark I'm playing with, since the fraction of
CallSite/Klass deps is very small compared to ConstantField.
(2) group nmethodBucket entries into chunks of k-nmethods; (TODO)
It should improve nmethod iteration speed in heavily populated contexts.
(3) iterate only dependencies of appropriate kind; (TODO)
There are 3 kinds of changes which require dependency checking: changes
in CHA (KlassDepChange), call site target change (CallSiteDepChange),
and constant field value change (ConstantFieldDepChange). Different
types of changes affect disjoint sets of dependencies. So, instead of
enumerating all dependencies in a nmethod, a more focused approach can
be used (e.g. check only call_site_target_value deps for
CallSiteDepChange).
Since dependencies are sorted by type when serialized in a nmethod, it's
possible to compute offsets for 3 disjoint sets and use them in
DepStream to iterate only relevant dependencies.
I hope it'll significantly reduce dependency checking costs I'm seeing.
That's all for now. Thanks!
Best regards,
Vladimir Ivanov
[1] "Do not expect to perform any sort of arithmetic on this offset;
it is just a cookie which is passed to the unsafe heap memory
accessors."
More information about the hotspot-compiler-dev
mailing list