[foreign] RFR 8224481: Optimize struct getter and field getter paths.
Jorn Vernee
jbvernee at xs4all.nl
Wed May 22 10:51:15 UTC 2019
Ah, good point.
> ClassValue<MH> -> MH -> StructImpl -> LayoutType -> Reference ->
> ClassValue<MH>
I don't think that last link is quite right though. The LayoutType
references the anonymous Reference class, not References.OfStruct (which
contains the ClassValue).
I think it would be:
User Code -> LayoutType -> anonymous Reference -> getter MH ->
StructImpl -> LayoutType
There could still be a cycle there, but the whole cycle can be GC'd once
the reference from user code goes away.
Jorn
Maurizio Cimadamore schreef op 2019-05-22 12:37:
> Looks good - yesterday I was looking at this discussion:
>
> http://mail.openjdk.java.net/pipermail/mlvm-dev/2016-January/006563.html
>
> I hope we don't run in the condition described there - e.g. that
> there's no strong reachability from the MH we're caching back to the
> static ClassValue instance - because, if that would be the case I
> think that would prevent class unloading.
>
> The problem is that the MethodHandle we cache refers to the stuct impl
> class, and I believe that class refers to some LayoutTypes on its own,
> which have a Reference inside, so it would be:
>
> ClassValue<MH> -> MH -> StructImpl -> LayoutType -> Reference ->
> ClassValue<MH>
>
> Sundar can you double check?
>
> Maurizio
>
> On 22/05/2019 10:56, Jorn Vernee wrote:
>> Good suggestion! This solves the problem, is nice and simple, and
>> keeps the same times in the benchmark.
>>
>> Updated webrev:
>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/opto/webrev.02/
>>
>> (only changes to References.java)
>>
>> I've added a test for the failure. I think that can be included as
>> well? I re-ran the samples I have as well, and this time it's all
>> green.
>>
>> Thanks,
>> Jorn
>>
>> Maurizio Cimadamore schreef op 2019-05-22 01:15:
>>> On 21/05/2019 20:16, Jorn Vernee wrote:
>>>> Although, now that you bring it up, I tried re-running some of the
>>>> samples (hadn't done that yet), and I'm seeing some infinite
>>>> recursion. This is seemingly caused by a circular type reference
>>>> (e.g. linked list). i.e. to spin the impl of an accessor we need the
>>>> LayoutType of the struct itself, which then tries to spin the impl
>>>> again, and so on. I guess this isn't a test case in our suite yet...
>>>>
>>>> I'll look into this.
>>>
>>> Good detective work! I guess it would make sense to try and reduce it
>>> down to a simpler test, and push the test first.
>>>
>>> Where I was going with this is - your patch effectively made the lazy
>>> resolution inside StructImplGenerator useless. If we really want to
>>> explore that option, then we should, I think, remove all lazy
>>> resolution sites and see what happens. It is possible that we don't
>>> rely so much on laziness as we did in the past (we did some fixes few
>>> months ago which stabilized resolution quite a bit) - in which case
>>> we
>>> can remove the resolution requests, although - I have to admit - I'm
>>> a
>>> bit skeptical. After all all you need it something like this (as you
>>> say):
>>>
>>> struct foo {
>>> struct foo *next;
>>> }
>>>
>>> Which is kind of the killer app for unresolved layouts in the first
>>> place.
>>>
>>> This is translated into a struct interface which has a getter of
>>> Pointer<foo>. To generate the getter you need to compute its
>>> LayoutType which is a pointer LayoutType, so you have to compute the
>>> pointee LayoutType which brings you back where you started (the whole
>>> 'foo' LayoutType). In other words, since now the creation of
>>> LayoutType<foo> requires the generation of the struct impl for 'foo'
>>> and since that depends (indirectly, through the pointer getter) on
>>> being able to produce a LayoutType<foo>, you get a circularity.
>>>
>>> One thing we could try is - instead of eagerly creating the struct
>>> impl, why don't we let the Reference.OfStruct having some mutable
>>> state in it? That is, we could start off with Reference getter which
>>> does the expensive refelective lookup - but then, once it has
>>> discovered the constructor MH, it can stash it in some field (which
>>> is
>>> private to that reference object) and use it later if the getter is
>>> used again. Then, you probably still need a ClassValue to stash a
>>> mapping between a Class and its Reference.OfStruct; but it seems like
>>> this could fit in more naturally?
>>>
>>> Maurizio
>>>
>>>>
>>>> Thanks,
>>>> Jorn
>>>>
>>>> Jorn Vernee schreef op 2019-05-21 21:06:
>>>>> Since we have the resolution context for NativeHeader, AFAIK there
>>>>> is
>>>>> no more difference between the resolution call done by
>>>>> StructImpleGenerator, and the one done by LayoutTypeImpl.ofStruct.
>>>>> So
>>>>> I don't think there are any more cases where we would have
>>>>> succeeded
>>>>> to resolve the Struct layout be delaying spinning the impl. At
>>>>> least
>>>>> the tests haven't caught such a case.
>>>>>
>>>>> The other thing is that the partial layout for the getter is caught
>>>>> in
>>>>> StructImplGenerator, but for the setter it's caught when calling
>>>>> bitSize on Unresolved. Saying layouts should be able to be resolved
>>>>> when calling LayoutType.ofStruct means we can use
>>>>> References.OfGrumpy,
>>>>> which makes the two more uniform.
>>>>>
>>>>> I have some ideas for keeping the lazy init semantics, but it's a
>>>>> bit
>>>>> more complex (using a MutableCallSite to mimic indy), and I'm not
>>>>> sure
>>>>> it will work as well.
>>>>>
>>>>> And, well, there was some talk about eagerly spinning the
>>>>> implementations any ways :)
>>>>>
>>>>> Jorn
>>>>>
>>>>> Maurizio Cimadamore schreef op 2019-05-21 20:09:
>>>>>> Looks good, although I'm a bit worried about the change in
>>>>>> semantics
>>>>>> w.r.t. eager instantiation. The binder will create a lot of
>>>>>> LayoutTypes when generating the implementation - I wonder there
>>>>>> were
>>>>>> cases before where we created a partial layout type, which then
>>>>>> got
>>>>>> resolved correctly by the time it was dereferenced (since we do
>>>>>> another resolve lazily in StructImplGenerator [1]).
>>>>>>
>>>>>> [1] -
>>>>>> http://hg.openjdk.java.net/panama/dev/file/5ea3089be5ac/src/java.base/share/classes/jdk/internal/foreign/StructImplGenerator.java#l52
>>>>>> On 21/05/2019 14:41, Jorn Vernee wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> After the recent string of benchmarking [1], I've arrived at 2
>>>>>>> optimizations to improve the speed of the measured code path.
>>>>>>>
>>>>>>> 1.) Specialization of Struct getter MethodHandles per struct
>>>>>>> class.
>>>>>>> 2.) Implementation of RuntimeSupport::casterImpl that does a
>>>>>>> fused cast and offset operation, to avoid creating multiple
>>>>>>> Pointer objects.
>>>>>>>
>>>>>>> The benchmark:
>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/bench/webrev.00/
>>>>>>> The optimizations:
>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/opto/webrev.00/
>>>>>>>
>>>>>>> I've split these into 2 so that it's easier to run the benchmarks
>>>>>>> with and without the optimizations. (benchmark uses the OpenJDK's
>>>>>>> builtin framework [2]).
>>>>>>>
>>>>>>> Since we're now more eagerly instantiating the struct impl class
>>>>>>> I had to work around partial struct types, since spinning the
>>>>>>> impl requires a non-partial type and now we're spinning the impl
>>>>>>> when creating the LayouType for the struct, as opposed to on the
>>>>>>> first dereference. To do this I'm detecting whether the struct is
>>>>>>> partial in LayoutType.ofStruct, and using a Reference.OfGrumpy in
>>>>>>> the case where it can not be resolved. Tbh, I think this makes
>>>>>>> things a little more clear as well as far as where/how the
>>>>>>> exception for deref of a partial type is thrown.
>>>>>>>
>>>>>>> Results on my machine before the optimization are:
>>>>>>>
>>>>>>> Benchmark Mode Cnt Score Error Units
>>>>>>> GetStruct.jni_baseline avgt 50 14.204 ▒ 0.566 ns/op
>>>>>>> GetStruct.panama_get_both avgt 50 507.638 ▒ 19.462
>>>>>>> ns/op
>>>>>>> GetStruct.panama_get_fieldonly avgt 50 90.236 ▒ 11.027
>>>>>>> ns/op
>>>>>>> GetStruct.panama_get_structonly avgt 50 370.783 ▒ 13.744
>>>>>>> ns/op
>>>>>>>
>>>>>>> And after:
>>>>>>>
>>>>>>> Benchmark Mode Cnt Score Error Units
>>>>>>> GetStruct.jni_baseline avgt 50 13.941 ▒ 0.485 ns/op
>>>>>>> GetStruct.panama_get_both avgt 50 41.199 ▒ 1.632 ns/op
>>>>>>> GetStruct.panama_get_fieldonly avgt 50 33.432 ▒ 1.889 ns/op
>>>>>>> GetStruct.panama_get_structonly avgt 50 13.469 ▒ 0.781 ns/op
>>>>>>>
>>>>>>> Where panama_get_structonly corresponds to 1., and
>>>>>>> panama_get_fieldonly corresponds to 2. For a total of about 12x
>>>>>>> speedup.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jorn
>>>>>>>
>>>>>>> [1] :
>>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-May/005469.html
>>>>>>> [2] : https://openjdk.java.net/jeps/230
More information about the panama-dev
mailing list