[External] : Re: MemorySegment.ofAddress(...).reinterpret(...)

Fri Jul 7 15:32:40 UTC 2023

 > As for unchecked copying from byte[], dropping the check entirely 
seems okay to me as long as it's restricted. I'll check overall 
performance again when those fixes have been applied.

We can't just drop an array bounds check, even if we were to make the 
method restricted. (Besides, a restricted method needs to check if the 
caller has native access, which adds its own overhead)

 > As for extra warmup, the test runs for about 9 minutes, so that 
should be enough to account for any extra inlining reductions required 
with the Panama version. When the fixes have been applied, I'll perform 
several test runs and average the results to get a better measurement.

It would depend on whether you're measuring the warmup period or not. Do 
you warmup first and then measure? If not, it would be useful to measure 
the two separately.

 > I'm still concerned about all the layers of code that have to be 
inlined and optimized, ultimately reducing to much simpler code. Is 
there a point at which HotSpot gives up inlining?

Yes, but we counter most of that by using @ForceInline in cases where we 
know that a method's implementation mostly folds away. Var/MethodHandles 
also use incremental inlining, which is very forgiving in terms of 
budget. If you're just using the MS::copy and MS::get/set methods I 
don't really expect inlining to be an issue.

 > There's also a bunch of other safety checks (null segments, casts, 
segment size checks, alignment, multiply by 1, readOnly check, base 
offsets, etc) that aren't needed at all with a plain JAVA_BYTE copy 
within the "ALL" segment.

Most of those checks fold away when using a constant 'ALL' segment. The 
only check that remains is there to see whether the access overflows 
Long.MAX_VALUE. We might be able to treat Long.MAX_VALUE as truly 
unbounded, and skip the check. But, that's a semantic change that I'd 
like to discuss with other team members first. Any ways, when looking at 
benchmarks, removing that extra check doesn't make a difference in terms 
of time, at least on my machine (I guess pipelining already does all the 
work).

Jorn

On 07/07/2023 16:07, Brian S O'Neill wrote:
> Of the array copies, ~60% are native to native, which shouldn't have 
> any bounds checks because they're copying within the unbounded "ALL" 
> segment. About 5% of the copies are native to Java byte arrays, and 
> the rest (~35%) are Java byte arrays to native. The length check for 
> copying into byte arrays is definitely the critical one, and I 
> wouldn't argue in favor of dropping it.
>
> As for unchecked copying from byte[], dropping the check entirely 
> seems okay to me as long as it's restricted. I'll check overall 
> performance again when those fixes have been applied.
>
> As for extra warmup, the test runs for about 9 minutes, so that should 
> be enough to account for any extra inlining reductions required with 
> the Panama version. When the fixes have been applied, I'll perform 
> several test runs and average the results to get a better measurement.
>
> I'm still concerned about all the layers of code that have to be 
> inlined and optimized, ultimately reducing to much simpler code. Is 
> there a point at which HotSpot gives up inlining? There's also a bunch 
> of other safety checks (null segments, casts, segment size checks, 
> alignment, multiply by 1, readOnly check, base offsets, etc) that 
> aren't needed at all with a plain JAVA_BYTE copy within the "ALL" 
> segment.
>
>
> On 2023-07-07 05:55 AM, Jorn Vernee wrote:
>> The fix for: https://bugs.openjdk.org/browse/JDK-8311594 would help 
>> with single accesses as well. For copies to/from Java arrays, I also 
>> found another issue: https://bugs.openjdk.org/browse/JDK-8311597 
>> fixing both of those should help. If this is a benchmark of a whole 
>> application, you might also be measuring some of the extra warmup 
>> needed for the Panama version.
>>
>> The Panama version is doing slightly more work, after all. Especially 
>> the bound checks on Java arrays is not something we can just drop. 
>> So, I think a slight regression is to be expected.
>>
>> Jorn
>>
>> On 07/07/2023 00:19, Brian S O'Neill wrote:
>>> When I use the "ALL" MemorySegment instead of allocating 
>>> MemorySegments on the fly for copies, the performance regression 
>>> drops to ~2%. When I revert to using unsafe native field accesses 
>>> (getInt, etc) and unsafe memory copies, the performance is back to 
>>> what it was before (+/- 0.0019%).
>>>
>>> The unsafe field access alone gives me back a bit more performance 
>>> than the unsafe copy, but that's because field accesses are more 
>>> common. The test runs for about nine minutes and performs ~20 
>>> million field accesses per second and ~12 million copies per second. 
>>> Many of the copies are less than 10 bytes in length, and so bounds 
>>> checks would add significant overhead.
>>>
>>> On 2023-07-06 11:45 AM, Brian S O'Neill wrote:
>>>> Thanks for looking into this. I'm already doing unaligned accesses, 
>>>> so there's not much I can do at this point on my end. I'll try to 
>>>> identify which operations are the most expensive and report back.
>>>>