[9] RFR (X): 8163880: Constant pool caching of fields inhibited/delayed unnecessarily
Zoltán Majó
zoltan.majo at oracle.com
Tue Aug 30 11:24:29 UTC 2016
Hi Coleen,
thank you for the review! Please see details below.
On 08/29/2016 10:31 PM, Coleen Phillimore wrote:
>
> http://cr.openjdk.java.net/~zmajo/8163880/webrev.02/src/share/vm/interpreter/interpreterRuntime.cpp.udiff.html
>
>
> I thought we always want to suppress resolving a static putfield, and
> only want to give the error if >= classfile version 53.
Yes, we throw an IllegalAccessError() only for class file version >=
53. For class file version < 53 we do not to throw an error. Delaying
CP cache resolution is therefore not necessary for class file version <
53. That is why we have the classfile version check in place.
If we remove the classfile version check, we'll delay CP caching for
class file version < 53 as well. That won't hurt correctness and is
will not hurt performance either (I did some experiments to verify that).
Unfortunately, I pushed the fixed before your mail arrived. But we can
open a new bug to handle the issue you have raised. Please let me know
if you'd like that and I can take care of the bug.
>
> *+bool has_initialized_final_update =
> info.field_holder()->major_version() >= 53 &&*
> *+ info.has_initialized_final_update();*
>
>
> So, not check the classfile version here.
>
> If I understand correctly, this only inhibits the quickening for when
> functions other than <clinit> are updating the final field, right?
If inhibits quickening if methods other than <clinit> *or <init>* are
updating the final field.
> I am surprised this hurt performance because I would have thought
> initializing final fields were an infrequent operation.
Here are some more details to answer that.
The CP cache does caching for get and put bytecodes individually.
Normally, the caching for put (in 'put_code') should be set up by put
instructions, the caching for get (in 'get_code') should be set up by
get instruction. The optimization is to let get instructions set up
caching for put bytecodes in some select cases.
1) The table below shows the way that worked before the regression
(i.e., before JDK-8160527). The table shows if put_code is set (S) or
not set (NS) after InterpreterRuntime::resolve_get_put(). The table
applies to both static and instance fields. Furthermore, the table
assumes that class initialization has already taken place (i.e.,
uninitialized_static is false).
| Instruction triggering resolution
|===================================
Field type | get | put
===================================================
final | NS | S
non-final | S | S
http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/29ed49c42486#l1.38
2) Unfortunately, JDK-8160527 changed the table to:
| Instruction triggering resolution
|===================================
Field type | get | put
===================================================
final | NS |*NS*
non-final | *NS* | S
http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/29ed49c42486#l1.30
The cause of the regression reported in JDK-8163880 is that
(1) For final fields, setting up the CP cache for put_code is inhibited
(Row 1). In most class files we have to deal with, only initializers
(<clinit>, <init>) update final fields. Inhibiting CP caching for static
final fields will most likely not impact performance too much, as
<clinit> methods are executed infrequently. Inhibiting CP for instance
fields has most likely have a larger performance impact, as <init>
methods are executed more frequently.
(2) For non-final fields, setting up the CP cache for put_code is not
set if the triggering instruction is a get (i.e., setting up the CP
cache is delayed until a put instruction attempts to set up the CP
cache), which can cause a number of extra calls into the runtime.
The regression is due to both (1) and (2). You pointed out (and I
agree) that (1) has a small effect if the field is static. I think the
effect of (1) is larger if the field is non-static. Also, there might be
a large effect due to (2). But we'd need to do some experiments to
confirm that.
Currently, the system works is as follows.
| Instruction triggering resolution
|===================================
Field type | get | put
===================================================
final | NS | S (for fields in class filed version <
53) / NS (for field in class file version >= 53 that have
initialized final updates)
non-final | S | S
We can change the final/put case to "S for all fields except ones with
initialized final updates)", as you suggested. That is not necessary
for correctness, but won't hurt performance either and could make code
more readable. Please let me know if you'd like a bug to be filed for
that -- I'd be happy to take care of it.
Thank you!
Best regards,
Zoltan
>
> thanks,
> Coleen
>
>
> On 8/29/16 1:28 AM, Zoltán Majó wrote:
>> Thank you, Vladimir, for the review!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>
>> On 08/26/2016 07:58 PM, Vladimir Kozlov wrote:
>>> Looks good.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 8/26/16 4:38 AM, Zoltán Majó wrote:
>>>> Hi Vladimir,
>>>>
>>>>
>>>> thank you for looking at this issue once more.
>>>>
>>>> On 08/25/2016 09:05 PM, Vladimir Kozlov wrote:
>>>>> Okay I think I understand what you are trying to say but it was
>>>>> not my
>>>>> point.
>>>>
>>>> Oh, I see. Then I misunderstood you.
>>>>
>>>>> I was questioning why put_code is set when bytecode could be get
>>>>> instruction. Reading comments and code around I think I understand
>>>>> now.
>>>>> We set both get_code and put_code marking that the field is resolved.
>>>>> But in some cases we delay marking to force field's reresolution.
>>>>
>>>> Yes, that is also my understanding.
>>>>
>>>>>
>>>>> In next code you can use is_static:
>>>>>
>>>>> bool uninitialized_static = ((bytecode ==
>>>>> Bytecodes::_getstatic ||
>>>>> bytecode == Bytecodes::_putstatic) &&
>>>>> !klass->is_initialized());
>>>>
>>>> OK, I've updated the code.
>>>>
>>>>>
>>>>> Code could be rearranged to be more compact (similar to original
>>>>> code):
>>>>>
>>>>> Bytecodes::Code put_code = (Bytecodes::Code)0;
>>>>> Bytecodes::Code get_code = (Bytecodes::Code)0;
>>>>> if (!uninitialized_static) {
>>>>> get_code = ((is_static) ? Bytecodes::_getstatic :
>>>>> Bytecodes::_getfield);
>>>>> if ((is_put && !has_initialized_final_update) ||
>>>>> !info.access_flags().is_final()) {
>>>>> put_code = ((is_static) ? Bytecodes::_putstatic :
>>>>> Bytecodes::_putfield);
>>>>> }
>>>>> }
>>>>
>>>> I agree and have updated the code accordingly.
>>>>
>>>> Here is the updated webrev:
>>>> http://cr.openjdk.java.net/~zmajo/8163880/webrev.02/
>>>>
>>>> JPRT testing passes (testset hotspot).
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 8/23/16 5:31 AM, Zoltán Majó wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>>
>>>>>> thank you for the feedback! Please see details below.
>>>>>>
>>>>>> On 08/22/2016 08:20 PM, Vladimir Kozlov wrote:
>>>>>>> Typo (double 'not'):
>>>>>>> + // and the required IllegalAccessError would not not be thrown.
>>>>>>
>>>>>> OK, I've corrected that.
>>>>>>
>>>>>>>
>>>>>>> I think has_initialized_final_update should be part of assert to
>>>>>>> see
>>>>>>> it in hs_err:
>>>>>>>
>>>>>>> + if (has_initialized_final_update) {
>>>>>>> + assert(info.access_flags().is_final(), "Fields with
>>>>>>> initialized
>>>>>>> final updates must be final");
>>>>>>> + }
>>>>>>
>>>>>> Yes, I've updated that as well.
>>>>>>
>>>>>>>
>>>>>>> Why is_put check is not done in outer check? It is missing for
>>>>>>> is_final():
>>>>>>>
>>>>>>> + if (!uninitialized_static) {
>>>>>>> + if ((is_put && !has_initialized_final_update) ||
>>>>>>> !info.access_flags().is_final()) {
>>>>>>
>>>>>> I think it's an optimization to do it this way: For
>>>>>> non-final-fields, we
>>>>>> want to set up CP caching faster (i.e., we want to reduce the
>>>>>> number of
>>>>>> times we call into the runtime to
>>>>>> LinkResolver::resolve_field_access()).We had that optimization in
>>>>>> place
>>>>>> before JDK-8160527, but not afterwards, which caused a performance
>>>>>> regression. Please see some details below.
>>>>>>
>>>>>> The CP cache does caching for get and put bytecodes individually.
>>>>>> Normally, the caching for put (in 'put_code') should be set up by
>>>>>> put
>>>>>> instructions, the caching for get (in 'get_code') should be set
>>>>>> up by
>>>>>> get instruction. The optimization is to let get instructions set up
>>>>>> caching for put bytecodes in some select cases. Here is the way
>>>>>> that
>>>>>> (not) workedbefore/after JDK-8160527.
>>>>>>
>>>>>>
>>>>>> 1) The table below describes the way resolution worked before
>>>>>> JDK-8160527. The table shows if put_code was set (S) or not set
>>>>>> (NS)
>>>>>> after InterpreterRuntime::resolve_get_put().
>>>>>>
>>>>>> | Instruction triggering resolution
>>>>>> |===================================
>>>>>> Field type | get | put
>>>>>> ===================================================
>>>>>> final | NS |S
>>>>>> non-final | S | S
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/29ed49c42486#l1.38
>>>>>>
>>>>>>
>>>>>> 2) Unfortunately, JDK-8160527 changed the table to:
>>>>>>
>>>>>> | Instruction triggering resolution
>>>>>> |===================================
>>>>>> Field type | get | put
>>>>>> ===================================================
>>>>>> final | NS |*NS*
>>>>>> non-final | *NS* | S
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/29ed49c42486#l1.30
>>>>>>
>>>>>> So the VM does not set up caching in *2additional cases*, which
>>>>>> causes a
>>>>>> performance regression.
>>>>>>
>>>>>>
>>>>>> 3) The current webrev proposes the following:
>>>>>>
>>>>>> | Instruction triggering resolution
>>>>>> |===================================
>>>>>> Field type | get | put
>>>>>> ===================================================
>>>>>> final | NS |*S (unless
>>>>>> has_initialized_final_update
>>>>>> is true for the field)*
>>>>>> non-final | S | S
>>>>>>
>>>>>> That table looks the same as what we had before JDK-8160527,
>>>>>> except for
>>>>>> fields with initialized final updates. For those fields we need
>>>>>> to defer
>>>>>> setting up CP caching so that the appropriate exception can be
>>>>>> thrown at
>>>>>> the right place (by LinkResolver::resolve_field_access()).
>>>>>>
>>>>>>
>>>>>> 4) If we would do what you suggested (I hope I understood your
>>>>>> suggestion correctly)thenwe'd move the is_put condition to the outer
>>>>>> check:
>>>>>>
>>>>>> + if (!uninitialized_static && is_put) {
>>>>>> + if (!has_initialized_final_update ||
>>>>>> !info.access_flags().is_final()) {
>>>>>>
>>>>>> Then the table would look:
>>>>>>
>>>>>> | Instruction triggering resolution
>>>>>> |===================================
>>>>>> Field type | get | put
>>>>>> ===================================================
>>>>>> final | NS |S(unless
>>>>>> has_initialized_final_update is
>>>>>> true for the field)
>>>>>> non-final | *NS* | S
>>>>>>
>>>>>> So we would have *still one case* where CP caching is not set up and
>>>>>> that would most likely result in some performance regression
>>>>>> (relative
>>>>>> to pre-JDK-8160527).
>>>>>>
>>>>>>
>>>>>> I hope I have not overlooked anythingabove.
>>>>>>
>>>>>> Here is the updated webrev:
>>>>>> http://cr.openjdk.java.net/~zmajo/8163880/webrev.01/
>>>>>>
>>>>>> JPRT testing is running.
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>>
>>>>>> Zoltan
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 8/22/16 4:20 AM, Zoltán Majó wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>> please review the fix for JDK-8163880.
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8163880
>>>>>>>>
>>>>>>>> Problem & solution: The performance regression reported by the
>>>>>>>> bug is
>>>>>>>> caused by unnecessary delay (or even inhibition) of constant pool
>>>>>>>> caching for fields targeted by put instructions. For a more
>>>>>>>> detailed description of the problem and possible solutions, please
>>>>>>>> see the following comment in the bug report [1].
>>>>>>>>
>>>>>>>> Webrev:
>>>>>>>> http://cr.openjdk.java.net/~zmajo/8163880/webrev.00/
>>>>>>>>
>>>>>>>> Testing:
>>>>>>>> - performance evaluation with the affected benchmark shows that
>>>>>>>> fix
>>>>>>>> above solves the performance regression;
>>>>>>>> - all hotspot test with -Xmixed and -Xcomp (running).
>>>>>>>>
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>>
>>>>>>>> Zoltan
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8163880?focusedCommentId=13990030&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13990030
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
>
More information about the hotspot-dev
mailing list