[foreign-memaccess+abi] RFR: Add support for high-level functions to copy to and from Java arrays [v2]
Uwe Schindler
uschindler at openjdk.java.net
Tue Jun 22 11:25:40 UTC 2021
On Tue, 22 Jun 2021 07:33:34 GMT, Uwe Schindler <uschindler at openjdk.org> wrote:
>>> Performance is much better, but not yet ideal!
>>>
>>> Maybe you should really think about implementing all those copy methods between arrays and segments without slicing and ideally also without wrapped segments.
>>>
>>> Wouldn't it be so complicated to just implement all the MemoryCopy methods like that?
>>
>> As you say, it is fairly easy to re-implement the various methods using unsafe directly. It does create a bit of a problem in terms of API stacking, in the sense that the `MemorySegment::copyFrom` method, while in principle is a "primitive" in reality cannot act like one because of performance-related concerns. But perhaps this is just a transient concern: eventually we'll get primitive classes and we'll be able to avoid allocation w/o having to rely on escape analysis.
>>
>> I'll work to re-implement the various methods tomorrow.
>
> Hi @mcimadamore,
>
> I was thinking about that last night ant I think I know what the issue and why it affects Lucene: As you remember, I mentioned that without tiered compilation and batch compilation enabled (the default for our benchmark to get more predictable result), the garbage produced and its collection is horrible. These are the first results above. As you see the copy methods are optimized only after 6 seconds. The total runtime of the benchmark was 73 seconds (50 seconds with our old code). So it looks like for the first 6 seconds the benchmark is hammering with many parallel queries the lucene index, creating incredible amounts of garbage (3 object instances for each memory copy), until the compilation kicks in. This is completely new to lucene, because the very low level code is carefully written to not produce any object instances, because from our experience, escape analysis kicks in way too late. The readBytes() method is not called in a loop, but it is still called quite often in a ve
ry complex code path making it hard for the optimizer to kick in. What happens now: In our benchmark, we use many threads to execute fulltext queries and without the optimization kicking in, it creates millions of objects and those have to be garbage collected. Due to the high load (many threads and expensive low level query execution), GC has less CPU resources and then fails to kick in early enough and wastes at end like 23 seconds extra in cleaning up the useless garbage, possibly with stop-the-world phases.
>
> With tiered compilation enabled, the compilation kicks in much earlier (after half a second already for C1) and after 1.2 seconds for C2. This avoids a lot of garbage, (about 1/6s also in the output at end). The total runtime of the benchmark goes down from 73 seconds to approx 55 seconds). So garbage collector does not go crazy, but we still see a slowdown of 10% for some queries, but not 40% slowdown for queries like in the non-tiered case.
>
> IMHO, inside Panama internals you should not rely on escape analysis to kick in early enough and implement stuff like copy methods without producing too much garbage by default.
>
>> It does create a bit of a problem in terms of API stacking, in the sense that the MemorySegment::copyFrom method, while in principle is a "primitive" in reality cannot act like one because of performance-related concerns.
>
> Maybe document in the API of MemoryCopy, that those methods behave like the slicing code, but are much more efficient.
>
> But I agree you should really work on letting the escape analysis kick in earlier, especially if code is not used in loops (like memory copy).
>
> Thanks,
> Uwe
> @uschindler I've just pushed a new iteration which uses unsafe directly when copying to/from arrays. Please give that a try - if numbers look good we will finalize this version.
There's something fishy:
Exception in thread "main" java.lang.UnsupportedOperationException: Attempt to write a read-only segment
at jdk.incubator.foreign/jdk.internal.foreign.AbstractMemorySegmentImpl.checkAccess(AbstractMemorySegmentImpl.java:359)
at jdk.incubator.foreign/jdk.incubator.foreign.MemoryCopy.copyToArray(MemoryCopy.java:108)
at org.apache.lucene.store.MemorySegmentIndexInput.readBytes(MemorySegmentIndexInput.java:151)
-------------
PR: https://git.openjdk.java.net/panama-foreign/pull/555
More information about the panama-dev
mailing list