RFR: Caching MethodType's descriptor string improves lambda linkage performance

Tue Sep 17 20:59:59 UTC 2013

On Sep 17, 2013, at 11:19 AM, Sergey Kuksenko <sergey.kuksenko at oracle.com> wrote:

> On 09/17/2013 04:41 AM, John Rose wrote:
>> The algorithmic change (string cache) is acceptable, although it will tend to increase footprint somewhat.  
> I don't think that it would be visible footprint increasing. MethodTypes
> are interned by implementation, so I don't expect that this "overhead"
> will be significant.

That's reasonable for now.  You should know that the 292 API does not mandate MethodType interning, and that future versions could possibly drop it.  But if we did we'd take all current caches, including this one, into consideration.

One problem with MT interning is that a lookup of a virtual method in a small class C has to create a direct MH whose type mentions C as a leading parameter type.  In many use cases, this type then gets dropped in favor of superclass, such as Object.  The MT then goes dead almost as soon as it was created.

>> We can tweak that later if necessary.  There are probably only a small number of such strings, so maybe a small leaky map would do the trick as well. 
> In case of small amount of such strings we will get a huge overhead from
> any kind of map.

I would expect O(10^1.5) memory references and branches.  Depends on what you mean by "huge".

> From performance point of view map may acceptable when
> we have a lot of such strings. The key fact here that we have a few
> amount of MethodTypes, but calls "toMethodDescriptorString@ very frequent.

Would there be any benefit from recording the original string from the constant pool?  The JVM path for this is SystemDictionary::find_method_handle_type, which makes an upcall to MethodHandleNatives.findMethodHandleType.  Currently only the ptypes and rtype are passed, but the JVM could also convert the internal Symbol to a java.lang.String and pass that also.  In that case, MT's created by JVM upcalls would be pre-equipped with descriptor strings.

This is probably worth an experiment, although it requires some JVM work.

>> But, please don't hand-inline methods (checkPtype, checkSlotCount).  That is usually the wrong answer.  
>> If there is a JIT inlining decision that isn't going right, we need to fix that, not distort the Java code.
> Unfortunately it is not a case when we may rely on JIT. Certainly, there
> are no perf problems here (except caching) after JIT (at least C2, I did
> n't check C1). But we have huge performance problems/impact here while
> interpreting. I know (I think) a really general solution for such perf
> problems. Besides it will be useful in a big amount of other places.  It
> is a bytecode level inline before interpretation.

As long as such transforms are online, it is better to run it through C1, which is about as fast as a bytecode transform and produces more a favorable end result.  To make that happen, consider pre-marking some methods with inflated invocation counts, so that they are queued for C1 compilation as soon as they start getting used.

Although that would be an unorthodox tactic for hotspot, it is an easier change to make, for similar performance gains, to any upstream code transformation.  (Eventually we'll figure out AOT, and then all this stuff will be moot.)

> Let's back to the patch.
> 
> checkPType is method which performs two _different_ actions. The first
> is check that type in not null and type is not void. The second action
> is calculate size of extra parameter slot (lond/double types).

You got me there.  As a compiler guy I pushed the second computation into the subroutine because I knew it would go dead if the value was unused.  I see that it bumps up the bootstrap cost for no benefit.

> So we
> came to idea that we need two different methods as from performance and
> OOP purity points of view. checkPType has exactly two usages. And the
> second usage don't need calculate size of parameters slots. So one
> action used only in the single place and we get more readable code if we
> inline it. The second action used twice, but it a very simple action
> (NPE and void check) and I think here we may do manual inline because of
> it is give us visible benefits and don't create any problems as for
> later JITing and code reading.

Rather than hand-inlining, then, please split checkPtype into checkPtype and ptypeSlotCount.  Leave checkPtypes alone, since it is probably marginally helpful to do both jobs in one pass over the data.

>> Also, I think you deleted a call to checkSlotCount that should still be there to detect illegal inputs.  One of the costs of hand-inlining is bitrot like that.
> Oops.
> You are right. I did a mistake.

I hope you get my overall point:  Hand optimizations have a continuing cost, related to obscuring the logical structure of the code.  The transforming engineer might make a mistake, and later maintaining engineers might also make mistakes.

https://blogs.oracle.com/jrose/entry/three_software_proverbs

> I eliminate one checkSlotCount usage by
> error, but I didn't inline anything in that place. So it is not a cost
> of hand-inlining, it is a cost of my inadvertence when I eliminame a
> whole code line by misclick and didn't noticed that.
> 
> 
> Tomorrow I'll do other webrev -  will try to pay attention to all
> comments and we may continue the discussion.

Good.

— John

> 
>> 
>> — John
>> 
>> On Sep 11, 2013, at 12:03 PM, Sergey Kuksenko <sergey.kuksenko at oracle.com> wrote:
>> 
>>> On 09/11/2013 09:01 PM, Aleksey Shipilev wrote:
>>>> On 09/11/2013 08:23 PM, Sergey Kuksenko wrote:
>>>>> http://cr.openjdk.java.net/~skuksenko/jsr335/8024635/webrev.00/
>>>> 
>>>> As much as I hate to see the hand code tweaking instead of relying on
>>>> compiler to do it's job, I understand this is about interpreter. Seems
>>>> good then.
>>>> 
>>>> * Formatting: "if(...)" should be "if (...")
>>> Will do
>>>> 
>>>> * Formatting: "//NPE" should be "// null check"
>>> I just preserve exiting comments.
>>> Decided don't modify original code which is not important to required
>>> functionality.
>>> 
>>>> 
>>>> * Formatting: "desc =  " should be "desc = "
>>>> 
>>>> * Formatting: this one should not use braces (for consistency with other
>>>> usages)?
>>>> 364         if(nptype == null) { //NPE
>>>> 365             throw new NullPointerException();
>>>> 366         }
>>> Let ask code owner.
>>>> 
>>>> * Explicit null-checks: implicits via .getClass and .equals always
>>>> bothered me in j.l.i.*; the idea was seemingly to piggyback on the
>>>> compiler intrinsics. 
>>> anyway it is doesn't matter after C1/C2
>>> 
>>>> Any idea what's the cost of using
>>>> Objects.requireNonNull there?
>>> If we are running under the interpreter Objects.requireNonNull costs
>>> enough to eliminate benefits from this fine tuning.
>>> 
>>> -- 
>>> Best regards,
>>> Sergey Kuksenko
>> 
> 
> 
> -- 
> Best regards,
> Sergey Kuksenko