[foreign] RFR 8224481: Optimize struct getter and field getter paths.
Jorn Vernee
jbvernee at xs4all.nl
Thu May 23 12:38:29 UTC 2019
Response inline....
Maurizio Cimadamore schreef op 2019-05-23 00:28:
> I did some more analysis on the ClassValue issue and I'm now convinced
> that what we are doing is _not_ problematic.
>
> What we really care about here is that, if we create a
> Reference.OfStruct for class Foo, we don't want the ClassValue we're
> using to cache that reference to prevent unloading of Foo. That is
> slightly different problem than the one described in [1]. There, the
> issue is that the storage associated with ClassValue (which lives
> inside Class objects) keeps growing indefinitively, in case where the
> computed values keep strong references to the ClassValue itself. This
> is due to the way in which ClassValue behaves.
>
> A ClassValue is not an ordinary map - rather, when you call 'get' on a
> ClassValue with a given class C, you really ask the Class object for C
> for its ClassValue storage (a so called ClassValueMap abstraction).
> This map is essentially a WeakHashMap<Identity, Entry>, where Identity
> is a field of the ClassValue uniquely identifying it, whereas Entry
> contains the computed value associated with that class and ClassValue
> instance (the Entry class has a lot of extra complexity to deal with
> versioning, which is irrelevant here).
>
> So, if the ClassValue instance goes away, the fact that we're using a
> WeakHashMap here, allows the map to shrink in size. Of course, for
> this to happen, you don't want to have a strong reference from Entry
> (that is, from the computed value) back to the ClassValue instance -
> as in doing so you will prevent collection of the WeakHashMap entries.
>
> The bug in [1] shows that, when that happens, it is essentially
> possible to grow the WeakHashMap attached to a class object at will,
> until an OOME is produced.
>
> But in our case we're not concerned with the fact that we keep adding
> multiple ClassValue to the _same_ class object; it's actually the
> opposite: we have a single ClassValue (in References.OfStruct), and
> many classes. In such a case, when the class goes away (because its
> classloader goes), it will just go away; there will be nothing
> preventing the collection of that class.
>
> Attached is a test (with two files, Test.java and Dummy.java) - Test
> creates a new class loader, loads Dummy in it, and then stash a value
> for the Dummy class into a shared ClassValue. To make things as nasty
> as possible, the value we're storing has strong references to both the
> Dummy class and the ClassValue itself. But, as soon as the loader is
> closed, the finalizer is run as expected and memory usage remains
> under control.
Thanks for the extensive research, and for explaining it! It's good to
hear that using ClassValue won't be an issue for us.
I tried out the test, and I'm also seeing the finalizer being run.
> So, popping back to our enhancements, I think what the patch does is
> legit. In terms of the code, I don't like how the code made OfStruct
> _not_ a Reference, and is instead using OfStruct as a holder for some
> helper functionalities, plus the cache, whereas the real reference is
> an anonymous class generated inside the computeValue() method.
>
> It seems to me that we could have Reference.OfStruct keep being a
> Reference, have a constructor that takes a Class object, and then have
> a static ClassValue field in References which, upon computeValue
> creates a new instance of Reference.OfStruct for that class. I think
> the implementation would be a lot more linear that way (unless I'm
> missing something).
Yeah, I think doing that would make more sense. It would also help show
what fields a struct Reference actually has.
I've also added @Stable to the MethodHandle field (as suggested in your
other email) and re-ran the benchmark, but did not see an obvious
performance increase. I looked at the profile for
`panama_get_structonly`, but nothing really stands out to me:
30.37% c2, level 4
org.sample.generated.GetStruct_panama_get_structonly_jmhTest::panama_get_structonly_avgt_jmhStub,
version 746
25.90% Unknown, level 0 java.lang.invoke.MethodHandle::invokeBasic,
version 102
16.95% c2, level 4
java.lang.invoke.LambdaForm$MH.0x0000000800c0a840::invoke, version 713
13.50% c2, level 4 org.sample.GetStruct::panama_get_structonly,
version 711
It looks like most time is spent on JMH overhead.
Updated webrev with your suggestions:
http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/opto/webrev.04/
(Only changes to References)
Jorn
> Cheers
> Maurizio
>
> [1] - https://bugs.openjdk.java.net/browse/JDK-8136353
>
> On 22/05/2019 16:30, Maurizio Cimadamore wrote:
>> Looks good - module pending questions on use of ClassValue.
>>
>> I think we should come up with some kind of test case that shows the
>> ClassValue issue and then test with different approaches
>>
>> Maurizio
>>
>> On 22/05/2019 16:09, Jorn Vernee wrote:
>>> Coming back to this once more,
>>>
>>> I finally got my profiler working (after setting up a separate
>>> project) and saw a lot of time spent getting the field offset:
>>>
>>> 37.00% c2, level 4
>>> jdk.internal.foreign.LayoutPaths$$Lambda$66.0x0000000800c05040::getAsLong,
>>> version 691
>>> 30.19% c2, level 4
>>> jdk.internal.foreign.RuntimeSupport::casterImpl, version 724
>>> 22.12% c2, level 4
>>> org.sample.generated.GetStruct_panama_get_fieldonly_jmhTest::panama_get_fieldonly_avgt_jmhStub,
>>> version 746
>>> ...
>>>
>>> i.e. the call to LayoutPath.offset() in RuntimeSupport::casterImpl
>>> can not be inlined, and we're re-computing the field offset over and
>>> over again.
>>>
>>> The fix for this is pretty simple; instead of passing the LayoutPath
>>> to the caster, we pre-compute the offset and then pass that. (This
>>> should be constant, right?).
>>>
>>> This yields some more speedup:
>>>
>>> Benchmark Mode Cnt Score Error Units
>>> GetStruct.jni_baseline avgt 50 13.337 ▒ 0.251 ns/op
>>> GetStruct.panama_get_both avgt 50 17.026 ▒ 0.458 ns/op
>>> GetStruct.panama_get_fieldonly avgt 50 7.796 ▒ 0.166 ns/op
>>> GetStruct.panama_get_structonly avgt 50 11.863 ▒ 0.358 ns/op
>>>
>>> Putting us pretty much even with jni_baseline.
>>>
>>> Updated Webrev:
>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/opto/webrev.03/
>>>
>>> (Only changes are to RuntimeSupport)
>>>
>>> Cheers,
>>> Jorn
>>>
>>> Jorn Vernee schreef op 2019-05-22 12:51:
>>>> Ah, good point.
>>>>
>>>>> ClassValue<MH> -> MH -> StructImpl -> LayoutType -> Reference ->
>>>>> ClassValue<MH>
>>>>
>>>> I don't think that last link is quite right though. The LayoutType
>>>> references the anonymous Reference class, not References.OfStruct
>>>> (which contains the ClassValue).
>>>>
>>>> I think it would be:
>>>>
>>>> User Code -> LayoutType -> anonymous Reference -> getter MH ->
>>>> StructImpl -> LayoutType
>>>>
>>>> There could still be a cycle there, but the whole cycle can be GC'd
>>>> once the reference from user code goes away.
>>>>
>>>> Jorn
>>>>
>>>> Maurizio Cimadamore schreef op 2019-05-22 12:37:
>>>>> Looks good - yesterday I was looking at this discussion:
>>>>>
>>>>> http://mail.openjdk.java.net/pipermail/mlvm-dev/2016-January/006563.html
>>>>> I hope we don't run in the condition described there - e.g. that
>>>>> there's no strong reachability from the MH we're caching back to
>>>>> the
>>>>> static ClassValue instance - because, if that would be the case I
>>>>> think that would prevent class unloading.
>>>>>
>>>>> The problem is that the MethodHandle we cache refers to the stuct
>>>>> impl
>>>>> class, and I believe that class refers to some LayoutTypes on its
>>>>> own,
>>>>> which have a Reference inside, so it would be:
>>>>>
>>>>> ClassValue<MH> -> MH -> StructImpl -> LayoutType -> Reference ->
>>>>> ClassValue<MH>
>>>>>
>>>>> Sundar can you double check?
>>>>>
>>>>> Maurizio
>>>>>
>>>>> On 22/05/2019 10:56, Jorn Vernee wrote:
>>>>>> Good suggestion! This solves the problem, is nice and simple, and
>>>>>> keeps the same times in the benchmark.
>>>>>>
>>>>>> Updated webrev:
>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/opto/webrev.02/
>>>>>>
>>>>>> (only changes to References.java)
>>>>>>
>>>>>> I've added a test for the failure. I think that can be included as
>>>>>> well? I re-ran the samples I have as well, and this time it's all
>>>>>> green.
>>>>>>
>>>>>> Thanks,
>>>>>> Jorn
>>>>>>
>>>>>> Maurizio Cimadamore schreef op 2019-05-22 01:15:
>>>>>>> On 21/05/2019 20:16, Jorn Vernee wrote:
>>>>>>>> Although, now that you bring it up, I tried re-running some of
>>>>>>>> the samples (hadn't done that yet), and I'm seeing some infinite
>>>>>>>> recursion. This is seemingly caused by a circular type reference
>>>>>>>> (e.g. linked list). i.e. to spin the impl of an accessor we need
>>>>>>>> the LayoutType of the struct itself, which then tries to spin
>>>>>>>> the impl again, and so on. I guess this isn't a test case in our
>>>>>>>> suite yet...
>>>>>>>>
>>>>>>>> I'll look into this.
>>>>>>>
>>>>>>> Good detective work! I guess it would make sense to try and
>>>>>>> reduce it
>>>>>>> down to a simpler test, and push the test first.
>>>>>>>
>>>>>>> Where I was going with this is - your patch effectively made the
>>>>>>> lazy
>>>>>>> resolution inside StructImplGenerator useless. If we really want
>>>>>>> to
>>>>>>> explore that option, then we should, I think, remove all lazy
>>>>>>> resolution sites and see what happens. It is possible that we
>>>>>>> don't
>>>>>>> rely so much on laziness as we did in the past (we did some fixes
>>>>>>> few
>>>>>>> months ago which stabilized resolution quite a bit) - in which
>>>>>>> case we
>>>>>>> can remove the resolution requests, although - I have to admit -
>>>>>>> I'm a
>>>>>>> bit skeptical. After all all you need it something like this (as
>>>>>>> you
>>>>>>> say):
>>>>>>>
>>>>>>> struct foo {
>>>>>>> struct foo *next;
>>>>>>> }
>>>>>>>
>>>>>>> Which is kind of the killer app for unresolved layouts in the
>>>>>>> first place.
>>>>>>>
>>>>>>> This is translated into a struct interface which has a getter of
>>>>>>> Pointer<foo>. To generate the getter you need to compute its
>>>>>>> LayoutType which is a pointer LayoutType, so you have to compute
>>>>>>> the
>>>>>>> pointee LayoutType which brings you back where you started (the
>>>>>>> whole
>>>>>>> 'foo' LayoutType). In other words, since now the creation of
>>>>>>> LayoutType<foo> requires the generation of the struct impl for
>>>>>>> 'foo'
>>>>>>> and since that depends (indirectly, through the pointer getter)
>>>>>>> on
>>>>>>> being able to produce a LayoutType<foo>, you get a circularity.
>>>>>>>
>>>>>>> One thing we could try is - instead of eagerly creating the
>>>>>>> struct
>>>>>>> impl, why don't we let the Reference.OfStruct having some mutable
>>>>>>> state in it? That is, we could start off with Reference getter
>>>>>>> which
>>>>>>> does the expensive refelective lookup - but then, once it has
>>>>>>> discovered the constructor MH, it can stash it in some field
>>>>>>> (which is
>>>>>>> private to that reference object) and use it later if the getter
>>>>>>> is
>>>>>>> used again. Then, you probably still need a ClassValue to stash a
>>>>>>> mapping between a Class and its Reference.OfStruct; but it seems
>>>>>>> like
>>>>>>> this could fit in more naturally?
>>>>>>>
>>>>>>> Maurizio
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jorn
>>>>>>>>
>>>>>>>> Jorn Vernee schreef op 2019-05-21 21:06:
>>>>>>>>> Since we have the resolution context for NativeHeader, AFAIK
>>>>>>>>> there is
>>>>>>>>> no more difference between the resolution call done by
>>>>>>>>> StructImpleGenerator, and the one done by
>>>>>>>>> LayoutTypeImpl.ofStruct. So
>>>>>>>>> I don't think there are any more cases where we would have
>>>>>>>>> succeeded
>>>>>>>>> to resolve the Struct layout be delaying spinning the impl. At
>>>>>>>>> least
>>>>>>>>> the tests haven't caught such a case.
>>>>>>>>>
>>>>>>>>> The other thing is that the partial layout for the getter is
>>>>>>>>> caught in
>>>>>>>>> StructImplGenerator, but for the setter it's caught when
>>>>>>>>> calling
>>>>>>>>> bitSize on Unresolved. Saying layouts should be able to be
>>>>>>>>> resolved
>>>>>>>>> when calling LayoutType.ofStruct means we can use
>>>>>>>>> References.OfGrumpy,
>>>>>>>>> which makes the two more uniform.
>>>>>>>>>
>>>>>>>>> I have some ideas for keeping the lazy init semantics, but it's
>>>>>>>>> a bit
>>>>>>>>> more complex (using a MutableCallSite to mimic indy), and I'm
>>>>>>>>> not sure
>>>>>>>>> it will work as well.
>>>>>>>>>
>>>>>>>>> And, well, there was some talk about eagerly spinning the
>>>>>>>>> implementations any ways :)
>>>>>>>>>
>>>>>>>>> Jorn
>>>>>>>>>
>>>>>>>>> Maurizio Cimadamore schreef op 2019-05-21 20:09:
>>>>>>>>>> Looks good, although I'm a bit worried about the change in
>>>>>>>>>> semantics
>>>>>>>>>> w.r.t. eager instantiation. The binder will create a lot of
>>>>>>>>>> LayoutTypes when generating the implementation - I wonder
>>>>>>>>>> there were
>>>>>>>>>> cases before where we created a partial layout type, which
>>>>>>>>>> then got
>>>>>>>>>> resolved correctly by the time it was dereferenced (since we
>>>>>>>>>> do
>>>>>>>>>> another resolve lazily in StructImplGenerator [1]).
>>>>>>>>>>
>>>>>>>>>> [1] -
>>>>>>>>>> http://hg.openjdk.java.net/panama/dev/file/5ea3089be5ac/src/java.base/share/classes/jdk/internal/foreign/StructImplGenerator.java#l52
>>>>>>>>>> On 21/05/2019 14:41, Jorn Vernee wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> After the recent string of benchmarking [1], I've arrived at
>>>>>>>>>>> 2 optimizations to improve the speed of the measured code
>>>>>>>>>>> path.
>>>>>>>>>>>
>>>>>>>>>>> 1.) Specialization of Struct getter MethodHandles per struct
>>>>>>>>>>> class.
>>>>>>>>>>> 2.) Implementation of RuntimeSupport::casterImpl that does a
>>>>>>>>>>> fused cast and offset operation, to avoid creating multiple
>>>>>>>>>>> Pointer objects.
>>>>>>>>>>>
>>>>>>>>>>> The benchmark:
>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/bench/webrev.00/
>>>>>>>>>>> The optimizations:
>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8224481/opto/webrev.00/
>>>>>>>>>>>
>>>>>>>>>>> I've split these into 2 so that it's easier to run the
>>>>>>>>>>> benchmarks with and without the optimizations. (benchmark
>>>>>>>>>>> uses the OpenJDK's builtin framework [2]).
>>>>>>>>>>>
>>>>>>>>>>> Since we're now more eagerly instantiating the struct impl
>>>>>>>>>>> class I had to work around partial struct types, since
>>>>>>>>>>> spinning the impl requires a non-partial type and now we're
>>>>>>>>>>> spinning the impl when creating the LayouType for the struct,
>>>>>>>>>>> as opposed to on the first dereference. To do this I'm
>>>>>>>>>>> detecting whether the struct is partial in
>>>>>>>>>>> LayoutType.ofStruct, and using a Reference.OfGrumpy in the
>>>>>>>>>>> case where it can not be resolved. Tbh, I think this makes
>>>>>>>>>>> things a little more clear as well as far as where/how the
>>>>>>>>>>> exception for deref of a partial type is thrown.
>>>>>>>>>>>
>>>>>>>>>>> Results on my machine before the optimization are:
>>>>>>>>>>>
>>>>>>>>>>> Benchmark Mode Cnt Score Error Units
>>>>>>>>>>> GetStruct.jni_baseline avgt 50 14.204 ▒ 0.566
>>>>>>>>>>> ns/op
>>>>>>>>>>> GetStruct.panama_get_both avgt 50 507.638 ▒ 19.462
>>>>>>>>>>> ns/op
>>>>>>>>>>> GetStruct.panama_get_fieldonly avgt 50 90.236 ▒ 11.027
>>>>>>>>>>> ns/op
>>>>>>>>>>> GetStruct.panama_get_structonly avgt 50 370.783 ▒ 13.744
>>>>>>>>>>> ns/op
>>>>>>>>>>>
>>>>>>>>>>> And after:
>>>>>>>>>>>
>>>>>>>>>>> Benchmark Mode Cnt Score Error Units
>>>>>>>>>>> GetStruct.jni_baseline avgt 50 13.941 ▒ 0.485
>>>>>>>>>>> ns/op
>>>>>>>>>>> GetStruct.panama_get_both avgt 50 41.199 ▒ 1.632
>>>>>>>>>>> ns/op
>>>>>>>>>>> GetStruct.panama_get_fieldonly avgt 50 33.432 ▒ 1.889
>>>>>>>>>>> ns/op
>>>>>>>>>>> GetStruct.panama_get_structonly avgt 50 13.469 ▒ 0.781
>>>>>>>>>>> ns/op
>>>>>>>>>>>
>>>>>>>>>>> Where panama_get_structonly corresponds to 1., and
>>>>>>>>>>> panama_get_fieldonly corresponds to 2. For a total of about
>>>>>>>>>>> 12x speedup.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Jorn
>>>>>>>>>>>
>>>>>>>>>>> [1] :
>>>>>>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-May/005469.html
>>>>>>>>>>> [2] : https://openjdk.java.net/jeps/230
More information about the panama-dev
mailing list