[foreign] RFR 8224481: Optimize struct getter and field getter paths.

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Tue May 21 18:09:20 UTC 2019


Looks good, although I'm a bit worried about the change in semantics 
w.r.t. eager instantiation. The binder will create a lot of LayoutTypes 
when generating the implementation - I wonder there were cases before 
where we created a partial layout type, which then got resolved 
correctly by the time it was dereferenced (since we do another resolve 
lazily in StructImplGenerator [1]).

[1] - 
http://hg.openjdk.java.net/panama/dev/file/5ea3089be5ac/src/java.base/share/classes/jdk/internal/foreign/StructImplGenerator.java#l52


On 21/05/2019 14:41, Jorn Vernee wrote:
> Hi,
>
> After the recent string of benchmarking [1], I've arrived at 2 
> optimizations to improve the speed of the measured code path.
>
> 1.) Specialization of Struct getter MethodHandles per struct class.
> 2.) Implementation of RuntimeSupport::casterImpl that does a fused 
> cast and offset operation, to avoid creating multiple Pointer objects.
>
> The benchmark: 
> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/bench/webrev.00/
> The optimizations: 
> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/opto/webrev.00/
>
> I've split these into 2 so that it's easier to run the benchmarks with 
> and without the optimizations. (benchmark uses the OpenJDK's builtin 
> framework [2]).
>
> Since we're now more eagerly instantiating the struct impl class I had 
> to work around partial struct types, since spinning the impl requires 
> a non-partial type and now we're spinning the impl when creating the 
> LayouType for the struct, as opposed to on the first dereference. To 
> do this I'm detecting whether the struct is partial in 
> LayoutType.ofStruct, and using a Reference.OfGrumpy in the case where 
> it can not be resolved. Tbh, I think this makes things a little more 
> clear as well as far as where/how the exception for deref of a partial 
> type is thrown.
>
> Results on my machine before the optimization are:
>
> Benchmark                        Mode  Cnt    Score    Error Units
> GetStruct.jni_baseline           avgt   50   14.204 ▒  0.566 ns/op
> GetStruct.panama_get_both        avgt   50  507.638 ▒ 19.462 ns/op
> GetStruct.panama_get_fieldonly   avgt   50   90.236 ▒ 11.027 ns/op
> GetStruct.panama_get_structonly  avgt   50  370.783 ▒ 13.744 ns/op
>
> And after:
>
> Benchmark                        Mode  Cnt   Score   Error  Units
> GetStruct.jni_baseline           avgt   50  13.941 ▒ 0.485  ns/op
> GetStruct.panama_get_both        avgt   50  41.199 ▒ 1.632  ns/op
> GetStruct.panama_get_fieldonly   avgt   50  33.432 ▒ 1.889  ns/op
> GetStruct.panama_get_structonly  avgt   50  13.469 ▒ 0.781  ns/op
>
> Where panama_get_structonly corresponds to 1., and 
> panama_get_fieldonly corresponds to 2. For a total of about 12x speedup.
>
> Thanks,
> Jorn
>
> [1] : 
> https://mail.openjdk.java.net/pipermail/panama-dev/2019-May/005469.html
> [2] : https://openjdk.java.net/jeps/230


More information about the panama-dev mailing list