[foreign] RFR 8224481: Optimize struct getter and field getter paths.
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Tue May 21 18:09:20 UTC 2019
Looks good, although I'm a bit worried about the change in semantics
w.r.t. eager instantiation. The binder will create a lot of LayoutTypes
when generating the implementation - I wonder there were cases before
where we created a partial layout type, which then got resolved
correctly by the time it was dereferenced (since we do another resolve
lazily in StructImplGenerator [1]).
[1] -
http://hg.openjdk.java.net/panama/dev/file/5ea3089be5ac/src/java.base/share/classes/jdk/internal/foreign/StructImplGenerator.java#l52
On 21/05/2019 14:41, Jorn Vernee wrote:
> Hi,
>
> After the recent string of benchmarking [1], I've arrived at 2
> optimizations to improve the speed of the measured code path.
>
> 1.) Specialization of Struct getter MethodHandles per struct class.
> 2.) Implementation of RuntimeSupport::casterImpl that does a fused
> cast and offset operation, to avoid creating multiple Pointer objects.
>
> The benchmark:
> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/bench/webrev.00/
> The optimizations:
> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/opto/webrev.00/
>
> I've split these into 2 so that it's easier to run the benchmarks with
> and without the optimizations. (benchmark uses the OpenJDK's builtin
> framework [2]).
>
> Since we're now more eagerly instantiating the struct impl class I had
> to work around partial struct types, since spinning the impl requires
> a non-partial type and now we're spinning the impl when creating the
> LayouType for the struct, as opposed to on the first dereference. To
> do this I'm detecting whether the struct is partial in
> LayoutType.ofStruct, and using a Reference.OfGrumpy in the case where
> it can not be resolved. Tbh, I think this makes things a little more
> clear as well as far as where/how the exception for deref of a partial
> type is thrown.
>
> Results on my machine before the optimization are:
>
> Benchmark Mode Cnt Score Error Units
> GetStruct.jni_baseline avgt 50 14.204 ▒ 0.566 ns/op
> GetStruct.panama_get_both avgt 50 507.638 ▒ 19.462 ns/op
> GetStruct.panama_get_fieldonly avgt 50 90.236 ▒ 11.027 ns/op
> GetStruct.panama_get_structonly avgt 50 370.783 ▒ 13.744 ns/op
>
> And after:
>
> Benchmark Mode Cnt Score Error Units
> GetStruct.jni_baseline avgt 50 13.941 ▒ 0.485 ns/op
> GetStruct.panama_get_both avgt 50 41.199 ▒ 1.632 ns/op
> GetStruct.panama_get_fieldonly avgt 50 33.432 ▒ 1.889 ns/op
> GetStruct.panama_get_structonly avgt 50 13.469 ▒ 0.781 ns/op
>
> Where panama_get_structonly corresponds to 1., and
> panama_get_fieldonly corresponds to 2. For a total of about 12x speedup.
>
> Thanks,
> Jorn
>
> [1] :
> https://mail.openjdk.java.net/pipermail/panama-dev/2019-May/005469.html
> [2] : https://openjdk.java.net/jeps/230
More information about the panama-dev
mailing list