RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v2]

Amit Kumar amitkumar at openjdk.org
Thu Apr 17 10:28:40 UTC 2025


On Wed, 9 Apr 2025 08:57:40 GMT, Amit Kumar <amitkumar at openjdk.org> wrote:

>> Unsafe::setMemory intrinsic implementation for s390x. 
>> 
>> Stub Code: 
>> 
>> 
>> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes)
>> --------------------------------------------------------------------------------
>>   0x000003ffb04b63c0:   ogrk	%r1,%r2,%r3
>>   0x000003ffb04b63c4:   nill	%r1,7
>>   0x000003ffb04b63c8:   je	0x000003ffb04b6410
>>   0x000003ffb04b63cc:   nill	%r1,3
>>   0x000003ffb04b63d0:   je	0x000003ffb04b6460
>>   0x000003ffb04b63d4:   nill	%r1,1
>>   0x000003ffb04b63d8:   jlh	0x000003ffb04b64a0
>>   0x000003ffb04b63dc:   risbg	%r4,%r4,48,55,8
>>   0x000003ffb04b63e2:   risbgz	%r1,%r3,32,63,62
>>   0x000003ffb04b63e8:   je	0x000003ffb04b6402
>>   0x000003ffb04b63ec:   nopr
>>   0x000003ffb04b63ee:   nopr
>>   0x000003ffb04b63f0:   sth	%r4,0(%r2)
>>   0x000003ffb04b63f4:   sth	%r4,2(%r2)
>>   0x000003ffb04b63f8:   agfi	%r2,4
>>   0x000003ffb04b63fe:   brct	%r1,0x000003ffb04b63f0
>>   0x000003ffb04b6402:   nilf	%r3,2
>>   0x000003ffb04b6408:   ber	%r14
>>   0x000003ffb04b640a:   sth	%r4,0(%r2)
>>   0x000003ffb04b640e:   br	%r14
>>   0x000003ffb04b6410:   risbg	%r4,%r4,48,55,8
>>   0x000003ffb04b6416:   risbg	%r4,%r4,32,47,16
>>   0x000003ffb04b641c:   risbg	%r4,%r4,0,31,32
>>   0x000003ffb04b6422:   risbgz	%r1,%r3,32,63,60
>>   0x000003ffb04b6428:   je	0x000003ffb04b6446
>>   0x000003ffb04b642c:   nopr
>>   0x000003ffb04b642e:   nopr
>>   0x000003ffb04b6430:   stg	%r4,0(%r2)
>>   0x000003ffb04b6436:   stg	%r4,8(%r2)
>>   0x000003ffb04b643c:   agfi	%r2,16
>>   0x000003ffb04b6442:   brct	%r1,0x000003ffb04b6430
>>   0x000003ffb04b6446:   nilf	%r3,8
>>   0x000003ffb04b644c:   ber	%r14
>>   0x000003ffb04b644e:   stg	%r4,0(%r2)
>>   0x000003ffb04b6454:   br	%r14
>>   0x000003ffb04b6456:   nopr
>>   0x000003ffb04b6458:   nopr
>>   0x000003ffb04b645a:   nopr
>>   0x000003ffb04b645c:   nopr
>>   0x000003ffb04b645e:   nopr
>>   0x000003ffb04b6460:   risbg	%r4,%r4,48,55,8
>>   0x000003ffb04b6466:   risbg	%r4,%r4,32,47,16
>>   0x000003ffb04b646c:   risbgz	%r1,%r3,32,63,61
>>   0x000003ffb04b6472:   je	0x000003ffb04b6492
>>   0x000003ffb04b6476:   nopr
>>   0x000003ffb04b6478:   nopr
>>   0x000003ffb04b647a:   nopr
>>   0x000003ffb04b647c:   nopr
>>   0x000003ffb04b647e:   nopr
>>   0x000003ffb04b6480:   st	%r4,0(%r2)
>>   0x000003ffb04b6484:   st	%r4,4(%r2)
>>   0x000003ffb04b6488:   agfi	%r2,8
>>   0x000003ffb04b648e:   brct	%r1,0x000003ffb04b6480
>>   0x000003ffb04b6492:   nilf	%r3,4
>>   0x000003ffb04b6498:   ber	%r14
>>   0x000003ffb04b649a:   st	%r4,0(%r2)
>>   0x0000...
>
> Amit Kumar has updated the pull request incrementally with four additional commits since the last revision:
> 
>  - reviews for Martin
>  - Revert "minor improvement"
>    
>    This reverts commit a6af6da26d1e0590dc24486131d1bc752e047f98.
>  - minor improvement
>  - reviews from Lutz and Martin

This result is from shared-machine, but looks like the regression part is fixed. 

We got regression because, for Unaligned case, only 1-byte store instruction were getting emitted (i.e. `stc`).  And as the alignment depends on two factors (`size` and `address where we are storing the value`). So we can't always exactly tell that this will be an aligned or un-aligned case in the Benchmark. 

I will do further testing and will see if more optimization can be done. Then will mark this PR ready for review. 


Benchmark                       (aligned)  (size)  Mode  Cnt  Score   Error  Units
MemorySegmentZeroUnsafe.panama       true       1  avgt   30  2.893 ± 0.013  ns/op
MemorySegmentZeroUnsafe.panama       true       2  avgt   30  3.122 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true       3  avgt   30  3.286 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true       4  avgt   30  3.401 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true       5  avgt   30  3.291 ± 0.021  ns/op
MemorySegmentZeroUnsafe.panama       true       6  avgt   30  3.455 ± 0.015  ns/op
MemorySegmentZeroUnsafe.panama       true       7  avgt   30  3.471 ± 0.007  ns/op
MemorySegmentZeroUnsafe.panama       true       8  avgt   30  3.215 ± 0.033  ns/op
MemorySegmentZeroUnsafe.panama       true      15  avgt   30  4.632 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true      16  avgt   30  3.815 ± 0.014  ns/op
MemorySegmentZeroUnsafe.panama       true      63  avgt   30  9.695 ± 0.036  ns/op
MemorySegmentZeroUnsafe.panama       true      64  avgt   30  5.296 ± 0.008  ns/op
MemorySegmentZeroUnsafe.panama       true     255  avgt   30  9.682 ± 0.011  ns/op
MemorySegmentZeroUnsafe.panama       true     256  avgt   30  9.508 ± 0.013  ns/op
MemorySegmentZeroUnsafe.panama      false       1  avgt   30  2.887 ± 0.005  ns/op
MemorySegmentZeroUnsafe.panama      false       2  avgt   30  3.134 ± 0.024  ns/op
MemorySegmentZeroUnsafe.panama      false       3  avgt   30  3.285 ± 0.005  ns/op
MemorySegmentZeroUnsafe.panama      false       4  avgt   30  3.397 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama      false       5  avgt   30  3.297 ± 0.049  ns/op
MemorySegmentZeroUnsafe.panama      false       6  avgt   30  3.445 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false       7  avgt   30  3.471 ± 0.007  ns/op
MemorySegmentZeroUnsafe.panama      false       8  avgt   30  3.204 ± 0.023  ns/op
MemorySegmentZeroUnsafe.panama      false      15  avgt   30  4.630 ± 0.007  ns/op
MemorySegmentZeroUnsafe.panama      false      16  avgt   30  3.811 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false      63  avgt   30  9.676 ± 0.012  ns/op
MemorySegmentZeroUnsafe.panama      false      64  avgt   30  9.690 ± 0.031  ns/op
MemorySegmentZeroUnsafe.panama      false     255  avgt   30  9.678 ± 0.013  ns/op
MemorySegmentZeroUnsafe.panama      false     256  avgt   30  4.180 ± 0.010  ns/op
MemorySegmentZeroUnsafe.unsafe       true       1  avgt   30  2.636 ± 0.060  ns/op
MemorySegmentZeroUnsafe.unsafe       true       2  avgt   30  2.379 ± 0.006  ns/op
MemorySegmentZeroUnsafe.unsafe       true       3  avgt   30  7.743 ± 0.009  ns/op
MemorySegmentZeroUnsafe.unsafe       true       4  avgt   30  2.531 ± 0.113  ns/op
MemorySegmentZeroUnsafe.unsafe       true       5  avgt   30  7.746 ± 0.012  ns/op
MemorySegmentZeroUnsafe.unsafe       true       6  avgt   30  3.183 ± 0.006  ns/op
MemorySegmentZeroUnsafe.unsafe       true       7  avgt   30  7.742 ± 0.011  ns/op
MemorySegmentZeroUnsafe.unsafe       true       8  avgt   30  2.580 ± 0.095  ns/op
MemorySegmentZeroUnsafe.unsafe       true      15  avgt   30  7.870 ± 0.184  ns/op
MemorySegmentZeroUnsafe.unsafe       true      16  avgt   30  2.523 ± 0.011  ns/op
MemorySegmentZeroUnsafe.unsafe       true      63  avgt   30  7.757 ± 0.033  ns/op
MemorySegmentZeroUnsafe.unsafe       true      64  avgt   30  3.580 ± 0.005  ns/op
MemorySegmentZeroUnsafe.unsafe       true     255  avgt   30  7.744 ± 0.009  ns/op
MemorySegmentZeroUnsafe.unsafe       true     256  avgt   30  8.090 ± 0.110  ns/op
MemorySegmentZeroUnsafe.unsafe      false       1  avgt   30  2.683 ± 0.025  ns/op
MemorySegmentZeroUnsafe.unsafe      false       2  avgt   30  7.747 ± 0.009  ns/op
MemorySegmentZeroUnsafe.unsafe      false       3  avgt   30  7.738 ± 0.009  ns/op
MemorySegmentZeroUnsafe.unsafe      false       4  avgt   30  7.745 ± 0.009  ns/op
MemorySegmentZeroUnsafe.unsafe      false       5  avgt   30  7.773 ± 0.064  ns/op
MemorySegmentZeroUnsafe.unsafe      false       6  avgt   30  7.736 ± 0.008  ns/op
MemorySegmentZeroUnsafe.unsafe      false       7  avgt   30  7.747 ± 0.010  ns/op
MemorySegmentZeroUnsafe.unsafe      false       8  avgt   30  7.748 ± 0.030  ns/op
MemorySegmentZeroUnsafe.unsafe      false      15  avgt   30  7.735 ± 0.008  ns/op
MemorySegmentZeroUnsafe.unsafe      false      16  avgt   30  7.747 ± 0.020  ns/op
MemorySegmentZeroUnsafe.unsafe      false      63  avgt   30  7.746 ± 0.013  ns/op
MemorySegmentZeroUnsafe.unsafe      false      64  avgt   30  7.743 ± 0.012  ns/op
MemorySegmentZeroUnsafe.unsafe      false     255  avgt   30  7.741 ± 0.011  ns/op
MemorySegmentZeroUnsafe.unsafe      false     256  avgt   30  2.739 ± 0.005  ns/op
Finished running test 'micro:java.lang.foreign.MemorySegmentZeroUnsafe'

Stub Code Generated with current code: 


StubRoutines::unsafe_setmemory [0x000003ff9c4b63c0, 0x000003ff9c4b64dc] (284 bytes)
--------------------------------------------------------------------------------
BFD: unknown S/390 disassembler option: s390
.long	0x00000000
  0x000003ff9c4b63c0:   ogrk	%r1,%r2,%r3
  0x000003ff9c4b63c4:   nill	%r1,7
  0x000003ff9c4b63c8:   je	0x000003ff9c4b641e
  0x000003ff9c4b63cc:   nill	%r1,3
  0x000003ff9c4b63d0:   je	0x000003ff9c4b6464
  0x000003ff9c4b63d4:   nill	%r1,1
  0x000003ff9c4b63d8:   jne	0x000003ff9c4b649e
  0x000003ff9c4b63dc:   risbg	%r4,%r4,48,55,8
  0x000003ff9c4b63e2:   risbgz	%r1,%r3,32,63,62
  0x000003ff9c4b63e8:   je	0x000003ff9c4b6410
  0x000003ff9c4b63ec:   nopr
  0x000003ff9c4b63ee:   nopr
  0x000003ff9c4b63f0:   nopr
  0x000003ff9c4b63f2:   nopr
  0x000003ff9c4b63f4:   nopr
  0x000003ff9c4b63f6:   nopr
  0x000003ff9c4b63f8:   nopr
  0x000003ff9c4b63fa:   nopr
  0x000003ff9c4b63fc:   nopr
  0x000003ff9c4b63fe:   nopr
  0x000003ff9c4b6400:   sth	%r4,0(%r2)
  0x000003ff9c4b6404:   sth	%r4,2(%r2)
  0x000003ff9c4b6408:   aghi	%r2,4
  0x000003ff9c4b640c:   brct	%r1,0x000003ff9c4b6400
  0x000003ff9c4b6410:   nilf	%r3,2
  0x000003ff9c4b6416:   ber	%r14
  0x000003ff9c4b6418:   sth	%r4,0(%r2)
  0x000003ff9c4b641c:   br	%r14
  0x000003ff9c4b641e:   risbg	%r4,%r4,48,55,8
  0x000003ff9c4b6424:   risbg	%r4,%r4,32,47,16
  0x000003ff9c4b642a:   risbg	%r4,%r4,0,31,32
  0x000003ff9c4b6430:   risbgz	%r1,%r3,32,63,60
  0x000003ff9c4b6436:   je	0x000003ff9c4b6454
  0x000003ff9c4b643a:   nopr
  0x000003ff9c4b643c:   nopr
  0x000003ff9c4b643e:   nopr
  0x000003ff9c4b6440:   stg	%r4,0(%r2)
  0x000003ff9c4b6446:   stg	%r4,8(%r2)
  0x000003ff9c4b644c:   aghi	%r2,16
  0x000003ff9c4b6450:   brct	%r1,0x000003ff9c4b6440
  0x000003ff9c4b6454:   nilf	%r3,8
  0x000003ff9c4b645a:   ber	%r14
  0x000003ff9c4b645c:   stg	%r4,0(%r2)
  0x000003ff9c4b6462:   br	%r14
  0x000003ff9c4b6464:   risbg	%r4,%r4,48,55,8
  0x000003ff9c4b646a:   risbg	%r4,%r4,32,47,16
  0x000003ff9c4b6470:   risbgz	%r1,%r3,32,63,61
  0x000003ff9c4b6476:   je	0x000003ff9c4b6490
  0x000003ff9c4b647a:   nopr
  0x000003ff9c4b647c:   nopr
  0x000003ff9c4b647e:   nopr
  0x000003ff9c4b6480:   st	%r4,0(%r2)
  0x000003ff9c4b6484:   st	%r4,4(%r2)
  0x000003ff9c4b6488:   aghi	%r2,8
  0x000003ff9c4b648c:   brct	%r1,0x000003ff9c4b6480
  0x000003ff9c4b6490:   nilf	%r3,4
  0x000003ff9c4b6496:   ber	%r14
  0x000003ff9c4b6498:   st	%r4,0(%r2)
  0x000003ff9c4b649c:   br	%r14
  0x000003ff9c4b649e:   cghi	%r3,256
  0x000003ff9c4b64a2:   jl	0x000003ff9c4b64c4
  0x000003ff9c4b64a6:   stc	%r4,0(%r2)
  0x000003ff9c4b64aa:   mvc	1(255,%r2),0(%r2)
  0x000003ff9c4b64b0:   aghi	%r2,256
  0x000003ff9c4b64b4:   aghi	%r3,-256
  0x000003ff9c4b64b8:   cghi	%r3,256
  0x000003ff9c4b64bc:   jh	0x000003ff9c4b64a6
  0x000003ff9c4b64c0:   ltr	%r3,%r3
  0x000003ff9c4b64c2:   ber	%r14
  0x000003ff9c4b64c4:   stc	%r4,0(%r2)
  0x000003ff9c4b64c8:   aghi	%r3,-2
  0x000003ff9c4b64cc:   blr	%r14
  0x000003ff9c4b64ce:   exrl	%r3,0x000003ff9c4b64d6
  0x000003ff9c4b64d4:   br	%r14
  0x000003ff9c4b64d6:   mvc	1(1,%r2),0(%r2)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2809303487
PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2812434376


More information about the hotspot-compiler-dev mailing list