[foreign] RFR 8224481: Optimize struct getter and field getter paths.
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Tue May 21 23:15:02 UTC 2019
On 21/05/2019 20:16, Jorn Vernee wrote:
> Although, now that you bring it up, I tried re-running some of the
> samples (hadn't done that yet), and I'm seeing some infinite
> recursion. This is seemingly caused by a circular type reference (e.g.
> linked list). i.e. to spin the impl of an accessor we need the
> LayoutType of the struct itself, which then tries to spin the impl
> again, and so on. I guess this isn't a test case in our suite yet...
>
> I'll look into this.
Good detective work! I guess it would make sense to try and reduce it
down to a simpler test, and push the test first.
Where I was going with this is - your patch effectively made the lazy
resolution inside StructImplGenerator useless. If we really want to
explore that option, then we should, I think, remove all lazy resolution
sites and see what happens. It is possible that we don't rely so much on
laziness as we did in the past (we did some fixes few months ago which
stabilized resolution quite a bit) - in which case we can remove the
resolution requests, although - I have to admit - I'm a bit skeptical.
After all all you need it something like this (as you say):
struct foo {
struct foo *next;
}
Which is kind of the killer app for unresolved layouts in the first place.
This is translated into a struct interface which has a getter of
Pointer<foo>. To generate the getter you need to compute its LayoutType
which is a pointer LayoutType, so you have to compute the pointee
LayoutType which brings you back where you started (the whole 'foo'
LayoutType). In other words, since now the creation of LayoutType<foo>
requires the generation of the struct impl for 'foo' and since that
depends (indirectly, through the pointer getter) on being able to
produce a LayoutType<foo>, you get a circularity.
One thing we could try is - instead of eagerly creating the struct impl,
why don't we let the Reference.OfStruct having some mutable state in it?
That is, we could start off with Reference getter which does the
expensive refelective lookup - but then, once it has discovered the
constructor MH, it can stash it in some field (which is private to that
reference object) and use it later if the getter is used again. Then,
you probably still need a ClassValue to stash a mapping between a Class
and its Reference.OfStruct; but it seems like this could fit in more
naturally?
Maurizio
>
> Thanks,
> Jorn
>
> Jorn Vernee schreef op 2019-05-21 21:06:
>> Since we have the resolution context for NativeHeader, AFAIK there is
>> no more difference between the resolution call done by
>> StructImpleGenerator, and the one done by LayoutTypeImpl.ofStruct. So
>> I don't think there are any more cases where we would have succeeded
>> to resolve the Struct layout be delaying spinning the impl. At least
>> the tests haven't caught such a case.
>>
>> The other thing is that the partial layout for the getter is caught in
>> StructImplGenerator, but for the setter it's caught when calling
>> bitSize on Unresolved. Saying layouts should be able to be resolved
>> when calling LayoutType.ofStruct means we can use References.OfGrumpy,
>> which makes the two more uniform.
>>
>> I have some ideas for keeping the lazy init semantics, but it's a bit
>> more complex (using a MutableCallSite to mimic indy), and I'm not sure
>> it will work as well.
>>
>> And, well, there was some talk about eagerly spinning the
>> implementations any ways :)
>>
>> Jorn
>>
>> Maurizio Cimadamore schreef op 2019-05-21 20:09:
>>> Looks good, although I'm a bit worried about the change in semantics
>>> w.r.t. eager instantiation. The binder will create a lot of
>>> LayoutTypes when generating the implementation - I wonder there were
>>> cases before where we created a partial layout type, which then got
>>> resolved correctly by the time it was dereferenced (since we do
>>> another resolve lazily in StructImplGenerator [1]).
>>>
>>> [1] -
>>> http://hg.openjdk.java.net/panama/dev/file/5ea3089be5ac/src/java.base/share/classes/jdk/internal/foreign/StructImplGenerator.java#l52
>>>
>>>
>>>
>>> On 21/05/2019 14:41, Jorn Vernee wrote:
>>>> Hi,
>>>>
>>>> After the recent string of benchmarking [1], I've arrived at 2
>>>> optimizations to improve the speed of the measured code path.
>>>>
>>>> 1.) Specialization of Struct getter MethodHandles per struct class.
>>>> 2.) Implementation of RuntimeSupport::casterImpl that does a fused
>>>> cast and offset operation, to avoid creating multiple Pointer objects.
>>>>
>>>> The benchmark:
>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/bench/webrev.00/
>>>> The optimizations:
>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/opto/webrev.00/
>>>>
>>>> I've split these into 2 so that it's easier to run the benchmarks
>>>> with and without the optimizations. (benchmark uses the OpenJDK's
>>>> builtin framework [2]).
>>>>
>>>> Since we're now more eagerly instantiating the struct impl class I
>>>> had to work around partial struct types, since spinning the impl
>>>> requires a non-partial type and now we're spinning the impl when
>>>> creating the LayouType for the struct, as opposed to on the first
>>>> dereference. To do this I'm detecting whether the struct is partial
>>>> in LayoutType.ofStruct, and using a Reference.OfGrumpy in the case
>>>> where it can not be resolved. Tbh, I think this makes things a
>>>> little more clear as well as far as where/how the exception for
>>>> deref of a partial type is thrown.
>>>>
>>>> Results on my machine before the optimization are:
>>>>
>>>> Benchmark Mode Cnt Score Error Units
>>>> GetStruct.jni_baseline avgt 50 14.204 ▒ 0.566 ns/op
>>>> GetStruct.panama_get_both avgt 50 507.638 ▒ 19.462 ns/op
>>>> GetStruct.panama_get_fieldonly avgt 50 90.236 ▒ 11.027 ns/op
>>>> GetStruct.panama_get_structonly avgt 50 370.783 ▒ 13.744 ns/op
>>>>
>>>> And after:
>>>>
>>>> Benchmark Mode Cnt Score Error Units
>>>> GetStruct.jni_baseline avgt 50 13.941 ▒ 0.485 ns/op
>>>> GetStruct.panama_get_both avgt 50 41.199 ▒ 1.632 ns/op
>>>> GetStruct.panama_get_fieldonly avgt 50 33.432 ▒ 1.889 ns/op
>>>> GetStruct.panama_get_structonly avgt 50 13.469 ▒ 0.781 ns/op
>>>>
>>>> Where panama_get_structonly corresponds to 1., and
>>>> panama_get_fieldonly corresponds to 2. For a total of about 12x
>>>> speedup.
>>>>
>>>> Thanks,
>>>> Jorn
>>>>
>>>> [1] :
>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-May/005469.html
>>>>
>>>> [2] : https://openjdk.java.net/jeps/230
More information about the panama-dev
mailing list