[foreign] RFR 8224481: Optimize struct getter and field getter paths.

Jorn Vernee jbvernee at xs4all.nl
Tue May 21 19:16:25 UTC 2019


Although, now that you bring it up, I tried re-running some of the 
samples (hadn't done that yet), and I'm seeing some infinite recursion. 
This is seemingly caused by a circular type reference (e.g. linked 
list). i.e. to spin the impl of an accessor we need the LayoutType of 
the struct itself, which then tries to spin the impl again, and so on. I 
guess this isn't a test case in our suite yet...

I'll look into this.

Thanks,
Jorn

Jorn Vernee schreef op 2019-05-21 21:06:
> Since we have the resolution context for NativeHeader, AFAIK there is
> no more difference between the resolution call done by
> StructImpleGenerator, and the one done by LayoutTypeImpl.ofStruct. So
> I don't think there are any more cases where we would have succeeded
> to resolve the Struct layout be delaying spinning the impl. At least
> the tests haven't caught such a case.
> 
> The other thing is that the partial layout for the getter is caught in
> StructImplGenerator, but for the setter it's caught when calling
> bitSize on Unresolved. Saying layouts should be able to be resolved
> when calling LayoutType.ofStruct means we can use References.OfGrumpy,
> which makes the two more uniform.
> 
> I have some ideas for keeping the lazy init semantics, but it's a bit
> more complex (using a MutableCallSite to mimic indy), and I'm not sure
> it will work as well.
> 
> And, well, there was some talk about eagerly spinning the
> implementations any ways :)
> 
> Jorn
> 
> Maurizio Cimadamore schreef op 2019-05-21 20:09:
>> Looks good, although I'm a bit worried about the change in semantics
>> w.r.t. eager instantiation. The binder will create a lot of
>> LayoutTypes when generating the implementation - I wonder there were
>> cases before where we created a partial layout type, which then got
>> resolved correctly by the time it was dereferenced (since we do
>> another resolve lazily in StructImplGenerator [1]).
>> 
>> [1] -
>> http://hg.openjdk.java.net/panama/dev/file/5ea3089be5ac/src/java.base/share/classes/jdk/internal/foreign/StructImplGenerator.java#l52
>> 
>> 
>> On 21/05/2019 14:41, Jorn Vernee wrote:
>>> Hi,
>>> 
>>> After the recent string of benchmarking [1], I've arrived at 2 
>>> optimizations to improve the speed of the measured code path.
>>> 
>>> 1.) Specialization of Struct getter MethodHandles per struct class.
>>> 2.) Implementation of RuntimeSupport::casterImpl that does a fused 
>>> cast and offset operation, to avoid creating multiple Pointer 
>>> objects.
>>> 
>>> The benchmark: 
>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/bench/webrev.00/
>>> The optimizations: 
>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/opto/webrev.00/
>>> 
>>> I've split these into 2 so that it's easier to run the benchmarks 
>>> with and without the optimizations. (benchmark uses the OpenJDK's 
>>> builtin framework [2]).
>>> 
>>> Since we're now more eagerly instantiating the struct impl class I 
>>> had to work around partial struct types, since spinning the impl 
>>> requires a non-partial type and now we're spinning the impl when 
>>> creating the LayouType for the struct, as opposed to on the first 
>>> dereference. To do this I'm detecting whether the struct is partial 
>>> in LayoutType.ofStruct, and using a Reference.OfGrumpy in the case 
>>> where it can not be resolved. Tbh, I think this makes things a little 
>>> more clear as well as far as where/how the exception for deref of a 
>>> partial type is thrown.
>>> 
>>> Results on my machine before the optimization are:
>>> 
>>> Benchmark                        Mode  Cnt    Score    Error Units
>>> GetStruct.jni_baseline           avgt   50   14.204 ▒  0.566 ns/op
>>> GetStruct.panama_get_both        avgt   50  507.638 ▒ 19.462 ns/op
>>> GetStruct.panama_get_fieldonly   avgt   50   90.236 ▒ 11.027 ns/op
>>> GetStruct.panama_get_structonly  avgt   50  370.783 ▒ 13.744 ns/op
>>> 
>>> And after:
>>> 
>>> Benchmark                        Mode  Cnt   Score   Error  Units
>>> GetStruct.jni_baseline           avgt   50  13.941 ▒ 0.485  ns/op
>>> GetStruct.panama_get_both        avgt   50  41.199 ▒ 1.632  ns/op
>>> GetStruct.panama_get_fieldonly   avgt   50  33.432 ▒ 1.889  ns/op
>>> GetStruct.panama_get_structonly  avgt   50  13.469 ▒ 0.781  ns/op
>>> 
>>> Where panama_get_structonly corresponds to 1., and 
>>> panama_get_fieldonly corresponds to 2. For a total of about 12x 
>>> speedup.
>>> 
>>> Thanks,
>>> Jorn
>>> 
>>> [1] : 
>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-May/005469.html
>>> [2] : https://openjdk.java.net/jeps/230


More information about the panama-dev mailing list