[foreign] RFR 8224481: Optimize struct getter and field getter paths.
Jorn Vernee
jbvernee at xs4all.nl
Wed May 22 09:56:06 UTC 2019
Good suggestion! This solves the problem, is nice and simple, and keeps
the same times in the benchmark.
Updated webrev:
http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/opto/webrev.02/
(only changes to References.java)
I've added a test for the failure. I think that can be included as well?
I re-ran the samples I have as well, and this time it's all green.
Thanks,
Jorn
Maurizio Cimadamore schreef op 2019-05-22 01:15:
> On 21/05/2019 20:16, Jorn Vernee wrote:
>> Although, now that you bring it up, I tried re-running some of the
>> samples (hadn't done that yet), and I'm seeing some infinite
>> recursion. This is seemingly caused by a circular type reference (e.g.
>> linked list). i.e. to spin the impl of an accessor we need the
>> LayoutType of the struct itself, which then tries to spin the impl
>> again, and so on. I guess this isn't a test case in our suite yet...
>>
>> I'll look into this.
>
> Good detective work! I guess it would make sense to try and reduce it
> down to a simpler test, and push the test first.
>
> Where I was going with this is - your patch effectively made the lazy
> resolution inside StructImplGenerator useless. If we really want to
> explore that option, then we should, I think, remove all lazy
> resolution sites and see what happens. It is possible that we don't
> rely so much on laziness as we did in the past (we did some fixes few
> months ago which stabilized resolution quite a bit) - in which case we
> can remove the resolution requests, although - I have to admit - I'm a
> bit skeptical. After all all you need it something like this (as you
> say):
>
> struct foo {
> struct foo *next;
> }
>
> Which is kind of the killer app for unresolved layouts in the first
> place.
>
> This is translated into a struct interface which has a getter of
> Pointer<foo>. To generate the getter you need to compute its
> LayoutType which is a pointer LayoutType, so you have to compute the
> pointee LayoutType which brings you back where you started (the whole
> 'foo' LayoutType). In other words, since now the creation of
> LayoutType<foo> requires the generation of the struct impl for 'foo'
> and since that depends (indirectly, through the pointer getter) on
> being able to produce a LayoutType<foo>, you get a circularity.
>
> One thing we could try is - instead of eagerly creating the struct
> impl, why don't we let the Reference.OfStruct having some mutable
> state in it? That is, we could start off with Reference getter which
> does the expensive refelective lookup - but then, once it has
> discovered the constructor MH, it can stash it in some field (which is
> private to that reference object) and use it later if the getter is
> used again. Then, you probably still need a ClassValue to stash a
> mapping between a Class and its Reference.OfStruct; but it seems like
> this could fit in more naturally?
>
> Maurizio
>
>>
>> Thanks,
>> Jorn
>>
>> Jorn Vernee schreef op 2019-05-21 21:06:
>>> Since we have the resolution context for NativeHeader, AFAIK there is
>>> no more difference between the resolution call done by
>>> StructImpleGenerator, and the one done by LayoutTypeImpl.ofStruct. So
>>> I don't think there are any more cases where we would have succeeded
>>> to resolve the Struct layout be delaying spinning the impl. At least
>>> the tests haven't caught such a case.
>>>
>>> The other thing is that the partial layout for the getter is caught
>>> in
>>> StructImplGenerator, but for the setter it's caught when calling
>>> bitSize on Unresolved. Saying layouts should be able to be resolved
>>> when calling LayoutType.ofStruct means we can use
>>> References.OfGrumpy,
>>> which makes the two more uniform.
>>>
>>> I have some ideas for keeping the lazy init semantics, but it's a bit
>>> more complex (using a MutableCallSite to mimic indy), and I'm not
>>> sure
>>> it will work as well.
>>>
>>> And, well, there was some talk about eagerly spinning the
>>> implementations any ways :)
>>>
>>> Jorn
>>>
>>> Maurizio Cimadamore schreef op 2019-05-21 20:09:
>>>> Looks good, although I'm a bit worried about the change in semantics
>>>> w.r.t. eager instantiation. The binder will create a lot of
>>>> LayoutTypes when generating the implementation - I wonder there were
>>>> cases before where we created a partial layout type, which then got
>>>> resolved correctly by the time it was dereferenced (since we do
>>>> another resolve lazily in StructImplGenerator [1]).
>>>>
>>>> [1] -
>>>> http://hg.openjdk.java.net/panama/dev/file/5ea3089be5ac/src/java.base/share/classes/jdk/internal/foreign/StructImplGenerator.java#l52
>>>> On 21/05/2019 14:41, Jorn Vernee wrote:
>>>>> Hi,
>>>>>
>>>>> After the recent string of benchmarking [1], I've arrived at 2
>>>>> optimizations to improve the speed of the measured code path.
>>>>>
>>>>> 1.) Specialization of Struct getter MethodHandles per struct class.
>>>>> 2.) Implementation of RuntimeSupport::casterImpl that does a fused
>>>>> cast and offset operation, to avoid creating multiple Pointer
>>>>> objects.
>>>>>
>>>>> The benchmark:
>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/bench/webrev.00/
>>>>> The optimizations:
>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/opto/webrev.00/
>>>>>
>>>>> I've split these into 2 so that it's easier to run the benchmarks
>>>>> with and without the optimizations. (benchmark uses the OpenJDK's
>>>>> builtin framework [2]).
>>>>>
>>>>> Since we're now more eagerly instantiating the struct impl class I
>>>>> had to work around partial struct types, since spinning the impl
>>>>> requires a non-partial type and now we're spinning the impl when
>>>>> creating the LayouType for the struct, as opposed to on the first
>>>>> dereference. To do this I'm detecting whether the struct is partial
>>>>> in LayoutType.ofStruct, and using a Reference.OfGrumpy in the case
>>>>> where it can not be resolved. Tbh, I think this makes things a
>>>>> little more clear as well as far as where/how the exception for
>>>>> deref of a partial type is thrown.
>>>>>
>>>>> Results on my machine before the optimization are:
>>>>>
>>>>> Benchmark Mode Cnt Score Error Units
>>>>> GetStruct.jni_baseline avgt 50 14.204 ▒ 0.566 ns/op
>>>>> GetStruct.panama_get_both avgt 50 507.638 ▒ 19.462 ns/op
>>>>> GetStruct.panama_get_fieldonly avgt 50 90.236 ▒ 11.027 ns/op
>>>>> GetStruct.panama_get_structonly avgt 50 370.783 ▒ 13.744 ns/op
>>>>>
>>>>> And after:
>>>>>
>>>>> Benchmark Mode Cnt Score Error Units
>>>>> GetStruct.jni_baseline avgt 50 13.941 ▒ 0.485 ns/op
>>>>> GetStruct.panama_get_both avgt 50 41.199 ▒ 1.632 ns/op
>>>>> GetStruct.panama_get_fieldonly avgt 50 33.432 ▒ 1.889 ns/op
>>>>> GetStruct.panama_get_structonly avgt 50 13.469 ▒ 0.781 ns/op
>>>>>
>>>>> Where panama_get_structonly corresponds to 1., and
>>>>> panama_get_fieldonly corresponds to 2. For a total of about 12x
>>>>> speedup.
>>>>>
>>>>> Thanks,
>>>>> Jorn
>>>>>
>>>>> [1] :
>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-May/005469.html
>>>>> [2] : https://openjdk.java.net/jeps/230
More information about the panama-dev
mailing list