[riscv-port] RFR: 8277391: riscv: implement tls function MacroAssembler::get_thread() by using tls.ie
zhengxiaolinX
duke at openjdk.java.net
Fri Nov 19 11:19:10 UTC 2021
On Fri, 19 Nov 2021 03:36:00 GMT, Fei Yang <fyang at openjdk.org> wrote:
>> Hi team,
>>
>> This patch fixes the unimplemented assembly version of `__ get_thread()`, bringing some performance improvement by directly using the x4 register to refer to the variable `Thread::_thr_current`. Tested tests under hotspot/ and jdk/. [Original patch](https://github.com/riscv-collab/riscv-openjdk/pull/4)
>>
>> Thanks,
>> Xiaolin
>
> src/hotspot/os_cpu/linux_riscv/threadLS_linux_riscv.S line 31:
>
>> 29: .type _ZN10JavaThread25riscv64_get_thread_helperEv, %function
>> 30:
>> 31: _ZN10JavaThread25riscv64_get_thread_helperEv:
>
> For maximum flexibility, I would suggest we switch to use the global-dynamic TLS model here if you don't have a strong reason to use the initial-exec TLS model. That's the default TLS model used by the host GCC compiler and I don't want to affected by the limits of the initial-exec TLS model some day.
> ---------------
> la.tls.gd a0, _ZN6Thread12_thr_currentE
> addi sp, sp, -16
> sd ra, 8(sp)
> call __tls_get_addr at plt
> ld ra, 8(sp)
> ld a0, 0(a0)
> addi sp, sp, 16
> ret
> ---------------
Hi Felix -
Quite sorry for the late reply - it takes me some time to look up some references. This code looks great without any problem and I totally agree with your point - so it is hard to make me not think you have implemented one version already in your codebase :-) And `__tls_get_addr`'s implementation could found itself at [this link](https://sources.debian.org/src/glibc/2.31-13/sysdeps/riscv/libc-tls.c/) with [macros defined](https://sources.debian.org/src/glibc/2.31-13/sysdeps/riscv/nptl/tls.h/), which is a C function. With this respect, it seems that I also need to save and restore all volatile registers pessimistically in `MacroAssembler::get_thread`, same as the current implementation on the master branch. I think it might be hard to keep the saved registers the same as AArch64's implementation that only clobbers x1 - please correct me if I missed something. But this seems much cheaper than calling a [pthread_getspecific](https://sources.debian.org/src/glibc/2.32-4/nptl/pthread_gets
pecific.c/?hl=24#L24). In fact, the first version of this patch is written in a similar way as yours - for better performance I chose to use tls.ie - but I think you hold a very reasonable point.
It's Friday night so please feel free to move the discussion to weekdays.
Thanks -
Xiaolin
-------------
PR: https://git.openjdk.java.net/riscv-port/pull/7
More information about the riscv-port-dev
mailing list