RFR: 8338417: Explicitly pin a virtual thread before acquiring the JFR string pool monitor [v3]
Markus Grönlund
mgronlun at openjdk.org
Wed Aug 21 14:31:03 UTC 2024
On Wed, 21 Aug 2024 14:21:39 GMT, Markus Grönlund <mgronlun at openjdk.org> wrote:
>> Thread.currentThread() has an intrinsic, and isVirtual is just a type check. ContinuationSupport.isSupported reads a static final so will disappear once compiled. The pattern we are using in other areas is for the pin to return a boolean (like David suggested).
>
> I looked into this in more detail. The current suggestion:
>
> mov r10,QWORD PTR [r15+0x388] ; _vthread OopHandle
> mov r10,QWORD PTR [r10] ; dereference OopHandle <<-- Thread.currentThread() intrinsic gives 2 instructions
> mov r11d,DWORD PTR [r10+0x8] ; InstanceKlass to r11 <-- isVirtual()
> mov r10d,r11d ; InstanceKlass to r10
> mov r8,QWORD PTR [r10+0x40] ; Load slot in InstanceKlass primary supers array to r8
> movabs r10,0x2d0481a8 ; InstanceKlass for {metadata('java/lang/BaseVirtualThread')} to r10
> cmp r8,r10 ; compare if superklass is java/lang/BaseVirtualThread
> jne 0x0000018571e0baf9 ; 6 instructions for isVirtual() type check, 8 instructions in total
>
> This gives a prologue of eight instructions.
>
> For JFR, we already have much of this information resolved when loading up the EventWriter instance using the existing intrinsic getEventWriter(). Hence, we could extend that to mark the event writer with a field to say if pinning should be performed. This results in only a two instruction prologue:
>
> test r8d,r8d ; pinVirtualThread?
> je 0x0000012580a0f6c9 ; 2 instructions for test
>
> This is an x4 speedup, although slightly less because of an additional store instruction for loading the event writer.
>
> Further, I looked into the Continuation.pin() and Continuation.unpin() methods. They are currently not intrinsics, but lend themselves well to intrinsification. I have created such intrinsics, and the results are quite good.
>
> Continuation.pin() or Continuation.unpin() without intrinsics = 112 instructions each
> Continuation.pin() or Continuation.unpin() with intrinsics = 8 instructions each
>
> This is an x14 speedup for virtual threads.
I plan to fix the event writer under this PR (to be updated) and file a separate tracking enhancement for the intrinsification of Continuation.pin() and Continuation.unpin().
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20588#discussion_r1725150866
More information about the core-libs-dev
mailing list