[foreign-memaccess+abi] RFR: Split foreign vector load and store by null or not null base [v2]
Radoslaw Smogura
duke at openjdk.org
Mon Aug 29 20:57:31 UTC 2022
On Wed, 24 Aug 2022 18:33:00 GMT, Radoslaw Smogura <duke at openjdk.org> wrote:
>> Split store / load operation by if checking if base is null
>> or not null.
>>
>> When this happens base in Unsafe is not perceived with mixed
>> access by VM, and VM does not insert barriers.
>>
>> Test results gives the expected values where the case of polluted access is 2x multiplication of normal access.
>>
>> After
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> MemorySegmentVectorAccess.arrayCopy 1024 avgt 10 7.437 ± 0.195 ns/op
>> MemorySegmentVectorAccess.directSegments 1024 avgt 10 15.593 ± 0.371 ns/op
>> MemorySegmentVectorAccess.heapSegments 1024 avgt 10 16.997 ± 0.118 ns/op
>> MemorySegmentVectorAccess.pollutedSegments2 1024 avgt 10 58.673 ± 105.783 ns/op
>> MemorySegmentVectorAccess.pollutedSegments3 1024 avgt 10 67.216 ± 16.157 ns/op
>> MemorySegmentVectorAccess.pollutedSegments4 1024 avgt 10 122.567 ± 263.950 ns/op
>> MemorySegmentVectorAccess.pollutedSegments5 1024 avgt 10 114.725 ± 209.183 ns/op
>>
>>
>> Before
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> MemorySegmentVectorAccess.arrayCopy 1024 avgt 10 8.547 ± 0.115 ns/op
>> MemorySegmentVectorAccess.directSegments 1024 avgt 10 15.536 ± 0.082 ns/op
>> MemorySegmentVectorAccess.heapSegments 1024 avgt 10 15.818 ± 0.101 ns/op
>> MemorySegmentVectorAccess.pollutedSegments2 1024 avgt 10 146.380 ± 1.127 ns/op
>> MemorySegmentVectorAccess.pollutedSegments3 1024 avgt 10 290.784 ± 7.274 ns/op
>> MemorySegmentVectorAccess.pollutedSegments4 1024 avgt 10 297.187 ± 5.096 ns/op
>> MemorySegmentVectorAccess.pollutedSegments5 1024 avgt 10 310.166 ± 9.310 ns/op
>>
>>
>> Additonally with profiling `load` and `store` method arguments as
>> described in [1]
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> MemorySegmentVectorAccess.arrayCopy 1024 avgt 10 7.480 ± 0.169 ns/op
>> MemorySegmentVectorAccess.directSegments 1024 avgt 10 15.497 ± 0.062 ns/op
>> MemorySegmentVectorAccess.heapSegments 1024 avgt 10 16.829 ± 0.132 ns/op
>> MemorySegmentVectorAccess.pollutedSegments2 1024 avgt 10 145.436 ± 1.081 ns/op
>> MemorySegmentVectorAccess.pollutedSegments3 1024 avgt 10 291.081 ± 2.297 ns/op
>> MemorySegmentVectorAccess.pollutedSegments4 1024 avgt 10 305.388 ± 7.518 ns/op
>> MemorySegmentVectorAccess.pollutedSegments5 1024 avgt 10 303.931 ± 3.412 ns/op
>>
>>
>> [1] https://github.com/openjdk/panama-foreign/pull/700
>
> Radoslaw Smogura has updated the pull request incrementally with one additional commit since the last revision:
>
> Add unswitching to masked vector operations
> Add benchmark covering this.
>
> After
> ```
> Benchmark (size) Mode Cnt Score Error Units
> MemorySegmentMaskedVectorAccess.arrayCopy 1024 avgt 10 16.700 ± 0.612 ns/op
> MemorySegmentMaskedVectorAccess.directSegments 1024 avgt 10 80.429 ± 2.897 ns/op
> MemorySegmentMaskedVectorAccess.heapSegments 1024 avgt 10 25.528 ± 0.296 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments2 1024 avgt 10 122.809 ± 0.894 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments3 1024 avgt 10 252.930 ± 4.623 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments4 1024 avgt 10 451.579 ± 6.429 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments5 1024 avgt 10 446.500 ± 39.156 ns/op
> ```
>
> Before
> ```
> Benchmark (size) Mode Cnt Score Error Units
> MemorySegmentMaskedVectorAccess.arrayCopy 1024 avgt 10 21.089 ± 0.219 ns/op
> MemorySegmentMaskedVectorAccess.directSegments 1024 avgt 10 81.384 ± 1.008 ns/op
> MemorySegmentMaskedVectorAccess.heapSegments 1024 avgt 10 25.626 ± 0.522 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments2 1024 avgt 10 217.733 ± 5.467 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments3 1024 avgt 10 441.045 ± 9.749 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments4 1024 avgt 10 522.613 ± 104.997 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments5 1024 avgt 10 449.814 ± 8.203 ns/op
> ```
I think I would need help, as I've found that large number of deoptimizations happens when I execute following code (both in case of Java split and VM split):
public static int test3(MemorySegment in, MemorySegment out, MemorySegment out2, byte[] arr) {
long sz = in.byteSize();
var zero = ByteVector.zero(SPECIES_BYTE);
for (long i = 0; i < SPECIES_BYTE.loopBound(in.byteSize()); i += SPECIES_BYTE.vectorByteSize()) {
var v1 = ByteVector.fromMemorySegment(SPECIES_BYTE, in, i, ByteOrder.nativeOrder());
// arr[i] = (byte) 0;
v1.intoMemorySegment(out, i, ByteOrder.nativeOrder());
}
return 0;
}
public static void main(String[] args) throws Exception {
var session = MemorySession.openConfined();
MemorySegment heapIn = MemorySegment.ofArray(new byte[size]);
MemorySegment heapOu = MemorySegment.ofArray(new byte[size]);
MemorySegment directIn = MemorySegment.allocateNative(size, session);
MemorySegment directOu = MemorySegment.allocateNative(size, session);
for (int i=0; i < 30_000; i++) {
test3(heapIn, heapOu, heapOu, (byte[]) heapOu.array().get());
test3(directIn, directOu, directOu, (byte[]) heapOu.array().get());
}
}
In compilation log I have huge amount of entries like
<deoptimized thread='31917' reason='constraint' pc='0x00007fffe134cdc7' compile_id='895' compiler='c1' level='3'>
<jvms bci='78' method='eu.smogura.panama.tests.vectorscopy.Main test3 (Ljava/lang/foreign/MemorySegment;Ljava/lang/foreign/MemorySegment;Ljava/lang/foreign/MemorySegment;[B)I' bytes='83' count='6061' backedge_count='39955677' iicount='6061' decompiles='95' profile_predicate_traps='100' overflow_recompiles='92'/>
The VM options I use
"-XX:+UnlockDiagnosticVMOptions", "-XX:CompileCommand=dontinline,\*::test3\*", "-XX:+LogCompilation",
Other thing I noticed, there's huge number of _PhaseIdealLoop_ phases (hits allowed maximum) and it create 64 CountedLoopNodes for main part of loop (there's should be at most 4 unswitched branches).
I wonder if someone could help me with this concern?
-------------
PR: https://git.openjdk.org/panama-foreign/pull/711
More information about the panama-dev
mailing list