[9] RFR (X): 8163880: Constant pool caching of fields inhibited/delayed unnecessarily
Zoltán Majó
zoltan.majo at oracle.com
Thu Sep 1 05:59:25 UTC 2016
Hi Coleen,
On 09/01/2016 02:28 AM, Coleen Phillimore wrote:
>
>
> On 8/30/16 7:24 AM, Zoltán Majó wrote:
>> Hi Coleen,
>>
>>
>> thank you for the review! Please see details below.
>>
>> On 08/29/2016 10:31 PM, Coleen Phillimore wrote:
>>>
>>> http://cr.openjdk.java.net/~zmajo/8163880/webrev.02/src/share/vm/interpreter/interpreterRuntime.cpp.udiff.html
>>>
>>>
>>> I thought we always want to suppress resolving a static putfield,
>>> and only want to give the error if >= classfile version 53.
>>
>> Yes, we throw an IllegalAccessError() only for class file version >=
>> 53. For class file version < 53 we do not to throw an error.
>> Delaying CP cache resolution is therefore not necessary for class
>> file version < 53. That is why we have the classfile version check
>> in place.
>
> I see where my confusion was now.
Thank you for looking into this issue!
Best regards,
Zoltan
> thanks,
> Coleen
>>
>> If we remove the classfile version check, we'll delay CP caching for
>> class file version < 53 as well. That won't hurt correctness and is
>> will not hurt performance either (I did some experiments to verify
>> that).
>>
>> Unfortunately, I pushed the fixed before your mail arrived. But we
>> can open a new bug to handle the issue you have raised. Please let me
>> know if you'd like that and I can take care of the bug.
>>
>>>
>>> *+bool has_initialized_final_update =
>>> info.field_holder()->major_version() >= 53 &&*
>>> *+ info.has_initialized_final_update();*
>>>
>>>
>>> So, not check the classfile version here.
>>>
>>> If I understand correctly, this only inhibits the quickening for
>>> when functions other than <clinit> are updating the final field, right?
>>
>> If inhibits quickening if methods other than <clinit> *or <init>* are
>> updating the final field.
>>
>>> I am surprised this hurt performance because I would have thought
>>> initializing final fields were an infrequent operation.
>>
>> Here are some more details to answer that.
>>
>> The CP cache does caching for get and put bytecodes individually.
>> Normally, the caching for put (in 'put_code') should be set up by put
>> instructions, the caching for get (in 'get_code') should be set up by
>> get instruction. The optimization is to let get instructions set up
>> caching for put bytecodes in some select cases.
>>
>> 1) The table below shows the way that worked before the regression
>> (i.e., before JDK-8160527). The table shows if put_code is set (S) or
>> not set (NS) after InterpreterRuntime::resolve_get_put(). The table
>> applies to both static and instance fields. Furthermore, the table
>> assumes that class initialization has already taken place (i.e.,
>> uninitialized_static is false).
>>
>> | Instruction triggering resolution
>> |===================================
>> Field type | get | put
>> ===================================================
>> final | NS | S
>> non-final | S | S
>>
>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/29ed49c42486#l1.38
>>
>> 2) Unfortunately, JDK-8160527 changed the table to:
>>
>> | Instruction triggering resolution
>> |===================================
>> Field type | get | put
>> ===================================================
>> final | NS |*NS*
>> non-final | *NS* | S
>>
>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/29ed49c42486#l1.30
>>
>> The cause of the regression reported in JDK-8163880 is that
>>
>> (1) For final fields, setting up the CP cache for put_code is
>> inhibited (Row 1). In most class files we have to deal with, only
>> initializers (<clinit>, <init>) update final fields. Inhibiting CP
>> caching for static final fields will most likely not impact
>> performance too much, as <clinit> methods are executed infrequently.
>> Inhibiting CP for instance fields has most likely have a larger
>> performance impact, as <init> methods are executed more frequently.
>>
>> (2) For non-final fields, setting up the CP cache for put_code is not
>> set if the triggering instruction is a get (i.e., setting up the CP
>> cache is delayed until a put instruction attempts to set up the CP
>> cache), which can cause a number of extra calls into the runtime.
>>
>> The regression is due to both (1) and (2). You pointed out (and I
>> agree) that (1) has a small effect if the field is static. I think
>> the effect of (1) is larger if the field is non-static. Also, there
>> might be a large effect due to (2). But we'd need to do some
>> experiments to confirm that.
>>
>> Currently, the system works is as follows.
>>
>> | Instruction triggering resolution
>> |===================================
>> Field type | get | put
>> ===================================================
>> final | NS | S (for fields in class filed version
>> < 53) / NS (for field in class file version >= 53 that have
>> initialized final updates)
>> non-final | S | S
>>
>> We can change the final/put case to "S for all fields except ones
>> with initialized final updates)", as you suggested. That is not
>> necessary for correctness, but won't hurt performance either and
>> could make code more readable. Please let me know if you'd like a
>> bug to be filed for that -- I'd be happy to take care of it.
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>>
>>> thanks,
>>> Coleen
>>>
>>>
>>> On 8/29/16 1:28 AM, Zoltán Majó wrote:
>>>> Thank you, Vladimir, for the review!
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>>>
>>>> On 08/26/2016 07:58 PM, Vladimir Kozlov wrote:
>>>>> Looks good.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 8/26/16 4:38 AM, Zoltán Majó wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>>
>>>>>> thank you for looking at this issue once more.
>>>>>>
>>>>>> On 08/25/2016 09:05 PM, Vladimir Kozlov wrote:
>>>>>>> Okay I think I understand what you are trying to say but it was
>>>>>>> not my
>>>>>>> point.
>>>>>>
>>>>>> Oh, I see. Then I misunderstood you.
>>>>>>
>>>>>>> I was questioning why put_code is set when bytecode could be get
>>>>>>> instruction. Reading comments and code around I think I
>>>>>>> understand now.
>>>>>>> We set both get_code and put_code marking that the field is
>>>>>>> resolved.
>>>>>>> But in some cases we delay marking to force field's reresolution.
>>>>>>
>>>>>> Yes, that is also my understanding.
>>>>>>
>>>>>>>
>>>>>>> In next code you can use is_static:
>>>>>>>
>>>>>>> bool uninitialized_static = ((bytecode ==
>>>>>>> Bytecodes::_getstatic ||
>>>>>>> bytecode == Bytecodes::_putstatic) &&
>>>>>>> !klass->is_initialized());
>>>>>>
>>>>>> OK, I've updated the code.
>>>>>>
>>>>>>>
>>>>>>> Code could be rearranged to be more compact (similar to original
>>>>>>> code):
>>>>>>>
>>>>>>> Bytecodes::Code put_code = (Bytecodes::Code)0;
>>>>>>> Bytecodes::Code get_code = (Bytecodes::Code)0;
>>>>>>> if (!uninitialized_static) {
>>>>>>> get_code = ((is_static) ? Bytecodes::_getstatic :
>>>>>>> Bytecodes::_getfield);
>>>>>>> if ((is_put && !has_initialized_final_update) ||
>>>>>>> !info.access_flags().is_final()) {
>>>>>>> put_code = ((is_static) ? Bytecodes::_putstatic :
>>>>>>> Bytecodes::_putfield);
>>>>>>> }
>>>>>>> }
>>>>>>
>>>>>> I agree and have updated the code accordingly.
>>>>>>
>>>>>> Here is the updated webrev:
>>>>>> http://cr.openjdk.java.net/~zmajo/8163880/webrev.02/
>>>>>>
>>>>>> JPRT testing passes (testset hotspot).
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>>
>>>>>> Zoltan
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 8/23/16 5:31 AM, Zoltán Majó wrote:
>>>>>>>> Hi Vladimir,
>>>>>>>>
>>>>>>>>
>>>>>>>> thank you for the feedback! Please see details below.
>>>>>>>>
>>>>>>>> On 08/22/2016 08:20 PM, Vladimir Kozlov wrote:
>>>>>>>>> Typo (double 'not'):
>>>>>>>>> + // and the required IllegalAccessError would not not be
>>>>>>>>> thrown.
>>>>>>>>
>>>>>>>> OK, I've corrected that.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think has_initialized_final_update should be part of assert
>>>>>>>>> to see
>>>>>>>>> it in hs_err:
>>>>>>>>>
>>>>>>>>> + if (has_initialized_final_update) {
>>>>>>>>> + assert(info.access_flags().is_final(), "Fields with
>>>>>>>>> initialized
>>>>>>>>> final updates must be final");
>>>>>>>>> + }
>>>>>>>>
>>>>>>>> Yes, I've updated that as well.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Why is_put check is not done in outer check? It is missing for
>>>>>>>>> is_final():
>>>>>>>>>
>>>>>>>>> + if (!uninitialized_static) {
>>>>>>>>> + if ((is_put && !has_initialized_final_update) ||
>>>>>>>>> !info.access_flags().is_final()) {
>>>>>>>>
>>>>>>>> I think it's an optimization to do it this way: For
>>>>>>>> non-final-fields, we
>>>>>>>> want to set up CP caching faster (i.e., we want to reduce the
>>>>>>>> number of
>>>>>>>> times we call into the runtime to
>>>>>>>> LinkResolver::resolve_field_access()).We had that optimization
>>>>>>>> in place
>>>>>>>> before JDK-8160527, but not afterwards, which caused a performance
>>>>>>>> regression. Please see some details below.
>>>>>>>>
>>>>>>>> The CP cache does caching for get and put bytecodes individually.
>>>>>>>> Normally, the caching for put (in 'put_code') should be set up
>>>>>>>> by put
>>>>>>>> instructions, the caching for get (in 'get_code') should be set
>>>>>>>> up by
>>>>>>>> get instruction. The optimization is to let get instructions
>>>>>>>> set up
>>>>>>>> caching for put bytecodes in some select cases. Here is the way
>>>>>>>> that
>>>>>>>> (not) workedbefore/after JDK-8160527.
>>>>>>>>
>>>>>>>>
>>>>>>>> 1) The table below describes the way resolution worked before
>>>>>>>> JDK-8160527. The table shows if put_code was set (S) or not
>>>>>>>> set (NS)
>>>>>>>> after InterpreterRuntime::resolve_get_put().
>>>>>>>>
>>>>>>>> | Instruction triggering resolution
>>>>>>>> |===================================
>>>>>>>> Field type | get | put
>>>>>>>> ===================================================
>>>>>>>> final | NS |S
>>>>>>>> non-final | S | S
>>>>>>>>
>>>>>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/29ed49c42486#l1.38
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2) Unfortunately, JDK-8160527 changed the table to:
>>>>>>>>
>>>>>>>> | Instruction triggering resolution
>>>>>>>> |===================================
>>>>>>>> Field type | get | put
>>>>>>>> ===================================================
>>>>>>>> final | NS |*NS*
>>>>>>>> non-final | *NS* | S
>>>>>>>>
>>>>>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/29ed49c42486#l1.30
>>>>>>>>
>>>>>>>>
>>>>>>>> So the VM does not set up caching in *2additional cases*, which
>>>>>>>> causes a
>>>>>>>> performance regression.
>>>>>>>>
>>>>>>>>
>>>>>>>> 3) The current webrev proposes the following:
>>>>>>>>
>>>>>>>> | Instruction triggering resolution
>>>>>>>> |===================================
>>>>>>>> Field type | get | put
>>>>>>>> ===================================================
>>>>>>>> final | NS |*S (unless
>>>>>>>> has_initialized_final_update
>>>>>>>> is true for the field)*
>>>>>>>> non-final | S | S
>>>>>>>>
>>>>>>>> That table looks the same as what we had before JDK-8160527,
>>>>>>>> except for
>>>>>>>> fields with initialized final updates. For those fields we need
>>>>>>>> to defer
>>>>>>>> setting up CP caching so that the appropriate exception can be
>>>>>>>> thrown at
>>>>>>>> the right place (by LinkResolver::resolve_field_access()).
>>>>>>>>
>>>>>>>>
>>>>>>>> 4) If we would do what you suggested (I hope I understood your
>>>>>>>> suggestion correctly)thenwe'd move the is_put condition to the
>>>>>>>> outer
>>>>>>>> check:
>>>>>>>>
>>>>>>>> + if (!uninitialized_static && is_put) {
>>>>>>>> + if (!has_initialized_final_update ||
>>>>>>>> !info.access_flags().is_final()) {
>>>>>>>>
>>>>>>>> Then the table would look:
>>>>>>>>
>>>>>>>> | Instruction triggering resolution
>>>>>>>> |===================================
>>>>>>>> Field type | get | put
>>>>>>>> ===================================================
>>>>>>>> final | NS |S(unless
>>>>>>>> has_initialized_final_update is
>>>>>>>> true for the field)
>>>>>>>> non-final | *NS* | S
>>>>>>>>
>>>>>>>> So we would have *still one case* where CP caching is not set
>>>>>>>> up and
>>>>>>>> that would most likely result in some performance regression
>>>>>>>> (relative
>>>>>>>> to pre-JDK-8160527).
>>>>>>>>
>>>>>>>>
>>>>>>>> I hope I have not overlooked anythingabove.
>>>>>>>>
>>>>>>>> Here is the updated webrev:
>>>>>>>> http://cr.openjdk.java.net/~zmajo/8163880/webrev.01/
>>>>>>>>
>>>>>>>> JPRT testing is running.
>>>>>>>>
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>>
>>>>>>>> Zoltan
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> On 8/22/16 4:20 AM, Zoltán Majó wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> please review the fix for JDK-8163880.
>>>>>>>>>>
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8163880
>>>>>>>>>>
>>>>>>>>>> Problem & solution: The performance regression reported by
>>>>>>>>>> the bug is
>>>>>>>>>> caused by unnecessary delay (or even inhibition) of constant
>>>>>>>>>> pool
>>>>>>>>>> caching for fields targeted by put instructions. For a more
>>>>>>>>>> detailed description of the problem and possible solutions,
>>>>>>>>>> please
>>>>>>>>>> see the following comment in the bug report [1].
>>>>>>>>>>
>>>>>>>>>> Webrev:
>>>>>>>>>> http://cr.openjdk.java.net/~zmajo/8163880/webrev.00/
>>>>>>>>>>
>>>>>>>>>> Testing:
>>>>>>>>>> - performance evaluation with the affected benchmark shows
>>>>>>>>>> that fix
>>>>>>>>>> above solves the performance regression;
>>>>>>>>>> - all hotspot test with -Xmixed and -Xcomp (running).
>>>>>>>>>>
>>>>>>>>>> Thank you!
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Zoltan
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8163880?focusedCommentId=13990030&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13990030
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>>
>
More information about the hotspot-dev
mailing list