[External] : Re: MemorySegment.ofAddress(...).reinterpret(...)
Brian S O'Neill
bronee at gmail.com
Fri Jul 7 14:07:40 UTC 2023
Of the array copies, ~60% are native to native, which shouldn't have any
bounds checks because they're copying within the unbounded "ALL"
segment. About 5% of the copies are native to Java byte arrays, and the
rest (~35%) are Java byte arrays to native. The length check for copying
into byte arrays is definitely the critical one, and I wouldn't argue in
favor of dropping it.
As for unchecked copying from byte[], dropping the check entirely seems
okay to me as long as it's restricted. I'll check overall performance
again when those fixes have been applied.
As for extra warmup, the test runs for about 9 minutes, so that should
be enough to account for any extra inlining reductions required with the
Panama version. When the fixes have been applied, I'll perform several
test runs and average the results to get a better measurement.
I'm still concerned about all the layers of code that have to be inlined
and optimized, ultimately reducing to much simpler code. Is there a
point at which HotSpot gives up inlining? There's also a bunch of other
safety checks (null segments, casts, segment size checks, alignment,
multiply by 1, readOnly check, base offsets, etc) that aren't needed at
all with a plain JAVA_BYTE copy within the "ALL" segment.
On 2023-07-07 05:55 AM, Jorn Vernee wrote:
> The fix for: https://bugs.openjdk.org/browse/JDK-8311594 would help with
> single accesses as well. For copies to/from Java arrays, I also found
> another issue: https://bugs.openjdk.org/browse/JDK-8311597 fixing both
> of those should help. If this is a benchmark of a whole application, you
> might also be measuring some of the extra warmup needed for the Panama
> version.
>
> The Panama version is doing slightly more work, after all. Especially
> the bound checks on Java arrays is not something we can just drop. So, I
> think a slight regression is to be expected.
>
> Jorn
>
> On 07/07/2023 00:19, Brian S O'Neill wrote:
>> When I use the "ALL" MemorySegment instead of allocating
>> MemorySegments on the fly for copies, the performance regression drops
>> to ~2%. When I revert to using unsafe native field accesses (getInt,
>> etc) and unsafe memory copies, the performance is back to what it was
>> before (+/- 0.0019%).
>>
>> The unsafe field access alone gives me back a bit more performance
>> than the unsafe copy, but that's because field accesses are more
>> common. The test runs for about nine minutes and performs ~20 million
>> field accesses per second and ~12 million copies per second. Many of
>> the copies are less than 10 bytes in length, and so bounds checks
>> would add significant overhead.
>>
>> On 2023-07-06 11:45 AM, Brian S O'Neill wrote:
>>> Thanks for looking into this. I'm already doing unaligned accesses,
>>> so there's not much I can do at this point on my end. I'll try to
>>> identify which operations are the most expensive and report back.
>>>
More information about the panama-dev
mailing list