[foreign] RFR 8224481: Optimize struct getter and field getter paths.
Jorn Vernee
jbvernee at xs4all.nl
Tue May 21 19:06:54 UTC 2019
Since we have the resolution context for NativeHeader, AFAIK there is no
more difference between the resolution call done by
StructImpleGenerator, and the one done by LayoutTypeImpl.ofStruct. So I
don't think there are any more cases where we would have succeeded to
resolve the Struct layout be delaying spinning the impl. At least the
tests haven't caught such a case.
The other thing is that the partial layout for the getter is caught in
StructImplGenerator, but for the setter it's caught when calling bitSize
on Unresolved. Saying layouts should be able to be resolved when calling
LayoutType.ofStruct means we can use References.OfGrumpy, which makes
the two more uniform.
I have some ideas for keeping the lazy init semantics, but it's a bit
more complex (using a MutableCallSite to mimic indy), and I'm not sure
it will work as well.
And, well, there was some talk about eagerly spinning the
implementations any ways :)
Jorn
Maurizio Cimadamore schreef op 2019-05-21 20:09:
> Looks good, although I'm a bit worried about the change in semantics
> w.r.t. eager instantiation. The binder will create a lot of
> LayoutTypes when generating the implementation - I wonder there were
> cases before where we created a partial layout type, which then got
> resolved correctly by the time it was dereferenced (since we do
> another resolve lazily in StructImplGenerator [1]).
>
> [1] -
> http://hg.openjdk.java.net/panama/dev/file/5ea3089be5ac/src/java.base/share/classes/jdk/internal/foreign/StructImplGenerator.java#l52
>
>
> On 21/05/2019 14:41, Jorn Vernee wrote:
>> Hi,
>>
>> After the recent string of benchmarking [1], I've arrived at 2
>> optimizations to improve the speed of the measured code path.
>>
>> 1.) Specialization of Struct getter MethodHandles per struct class.
>> 2.) Implementation of RuntimeSupport::casterImpl that does a fused
>> cast and offset operation, to avoid creating multiple Pointer objects.
>>
>> The benchmark:
>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/bench/webrev.00/
>> The optimizations:
>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/opto/webrev.00/
>>
>> I've split these into 2 so that it's easier to run the benchmarks with
>> and without the optimizations. (benchmark uses the OpenJDK's builtin
>> framework [2]).
>>
>> Since we're now more eagerly instantiating the struct impl class I had
>> to work around partial struct types, since spinning the impl requires
>> a non-partial type and now we're spinning the impl when creating the
>> LayouType for the struct, as opposed to on the first dereference. To
>> do this I'm detecting whether the struct is partial in
>> LayoutType.ofStruct, and using a Reference.OfGrumpy in the case where
>> it can not be resolved. Tbh, I think this makes things a little more
>> clear as well as far as where/how the exception for deref of a partial
>> type is thrown.
>>
>> Results on my machine before the optimization are:
>>
>> Benchmark Mode Cnt Score Error Units
>> GetStruct.jni_baseline avgt 50 14.204 ▒ 0.566 ns/op
>> GetStruct.panama_get_both avgt 50 507.638 ▒ 19.462 ns/op
>> GetStruct.panama_get_fieldonly avgt 50 90.236 ▒ 11.027 ns/op
>> GetStruct.panama_get_structonly avgt 50 370.783 ▒ 13.744 ns/op
>>
>> And after:
>>
>> Benchmark Mode Cnt Score Error Units
>> GetStruct.jni_baseline avgt 50 13.941 ▒ 0.485 ns/op
>> GetStruct.panama_get_both avgt 50 41.199 ▒ 1.632 ns/op
>> GetStruct.panama_get_fieldonly avgt 50 33.432 ▒ 1.889 ns/op
>> GetStruct.panama_get_structonly avgt 50 13.469 ▒ 0.781 ns/op
>>
>> Where panama_get_structonly corresponds to 1., and
>> panama_get_fieldonly corresponds to 2. For a total of about 12x
>> speedup.
>>
>> Thanks,
>> Jorn
>>
>> [1] :
>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-May/005469.html
>> [2] : https://openjdk.java.net/jeps/230
More information about the panama-dev
mailing list