RFR: 8263718: unused-result warning happens at os_linux.cpp
Yasumasa Suenaga
ysuenaga at openjdk.java.net
Mon Mar 29 08:13:26 UTC 2021
On Sun, 21 Mar 2021 06:40:18 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:
>> @magicus I checked assembly code of release build, it is similar to fastdebug build. The stack (RSP) is expanded.
>> (I confirmed it with Visual Studio 16.9.2 because I received update notification before your reply...)
>> // Try to randomize the cache line index of hot stack frames.
>> // This helps when threads of the same stack traces evict each other's
>> // cache lines. The threads can be either from the same JVM instance, or
>> // from different JVM instances. The benefit is especially true for
>> // processors with hyperthreading technology.
>> static int counter = 0;
>> int pid = os::current_process_id();
>> 00007FF80F4E10ED mov eax,dword ptr [_initial_pid (07FF80F9D8164h)]
>> 00007FF80F4E10F3 test eax,eax
>> 00007FF80F4E10F5 jne thread_native_entry+3Dh (07FF80F4E10FDh)
>> 00007FF80F4E10F7 call qword ptr [__imp__getpid (07FF80F6EF748h)]
>> _alloca(((pid ^ counter++) & 7) * 128);
>> 00007FF80F4E10FD mov ecx,dword ptr [counter (07FF80F9D8300h)]
>> 00007FF80F4E1103 mov edx,ecx
>> 00007FF80F4E1105 xor edx,eax
>> 00007FF80F4E1107 inc ecx
>> 00007FF80F4E1109 mov dword ptr [counter (07FF80F9D8300h)],ecx
>> 00007FF80F4E110F and edx,7
>> 00007FF80F4E1112 shl edx,7
>> 00007FF80F4E1115 mov eax,edx
>> 00007FF80F4E1117 lea rcx,[rdx+0Fh]
>> 00007FF80F4E111B cmp rcx,rax
>> 00007FF80F4E111E ja thread_native_entry+6Ah (07FF80F4E112Ah)
>> 00007FF80F4E1120 mov rcx,0FFFFFFFFFFFFFF0h
>> 00007FF80F4E112A and rcx,0FFFFFFFFFFFFFFF0h
>> 00007FF80F4E112E mov rax,rcx
>> 00007FF80F4E1131 call __chkstk (07FF80F6ED370h)
>> 00007FF80F4E1136 sub rsp,rcx
>
>> Did you measure on Alpine too, with muslc? And the XXXBsds? Are we sure we
>> measure the right thing? I wish there were regression tests telling us when
>> to re-apply this optimization.
>
> I think we can decide by whether `alloca()` or equivalent stack operation is contained in current binary.
> If current binary does not contain it like JDK 17 Linux x64, we can remove it because it already does not work - performance degradation will not happen.
>
> In its context, I think Alpine + musl libc is also ok to remove it because I confirmed JDK 17 for Alpine x64 (from jdk.java.net) does not contain stack operation same as x64.
>
> 0000000000b889b0 <thread_native_entry(Thread*)>:
> b889b0: 55 push %rbp
> b889b1: 48 89 e5 mov %rsp,%rbp
> b889b4: 41 56 push %r14
> b889b6: 41 55 push %r13
> b889b8: 49 89 fd mov %rdi,%r13
> b889bb: 41 54 push %r12
> b889bd: 53 push %rbx
> b889be: e8 6d a5 1a 00 callq d32f30 <Thread::record_stack_base_and_size()>
> b889c3: e8 e8 fb 6a ff callq 2385b0 <getpid at plt>
> b889c8: 4c 89 ef mov %r13,%rdi
> b889cb: 83 05 b6 98 61 00 01 addl $0x1,0x6198b6(%rip) # 11a2288 <thread_native_entry(Thread*)::counter>
> b889d2: e8 f9 a4 1a 00 callq d32ed0 <Thread::initialize_thread_current()>
> b889d7: 49 8b 9d 68 02 00 00 mov 0x268(%r13),%rbx
> b889de: 31 c0 xor %eax,%eax
>
>> (Please leave the alloca in the AIX implementation; we currently don't have
>> the cycles to run regression tests for this)
>
> Ok, I got it.
My colleague confirmed that `alloca()` (or equivalent stack operation) does not exist in JDK 16 from jdk.java.net:
(lldb) disass
libjvm.dylib`thread_native_entry:
-> 0x103892030 <+0>: pushq %rbp
0x103892031 <+1>: movq %rsp, %rbp
0x103892034 <+4>: pushq %r15
0x103892036 <+6>: pushq %r14
0x103892038 <+8>: pushq %r13
0x10389203a <+10>: pushq %r12
0x10389203c <+12>: pushq %rbx
0x10389203d <+13>: pushq %rax
0x10389203e <+14>: movq %rdi, %r14
0x103892041 <+17>: callq 0x103a03690 ; Thread::record_stack_base_and_size()
0x103892046 <+22>: callq 0x103aed89a ; symbol stub for: getpid
0x10389204b <+27>: incl 0x452203(%rip) ; thread_native_entry(Thread*)::counter
0x103892051 <+33>: movq %r14, %rdi
0x103892054 <+36>: callq 0x103a03650 ; Thread::initialize_thread_current()
0x103892059 <+41>: movq 0x270(%r14), %rbx
0x103892060 <+48>: movq 0x50(%rbx), %r15
So I think we can remove `alloca()` from os_bsd, os_windows safely. In Linux, David confirms it did not show any degradation in performance, however we cannot evaluete platforms other than glibc. So I left `alloca()` call if it does not run on glibc, and I modified to affect it.
Could you review again?
(At request from Thomas, I've not changed os_aix.cpp in this PR.)
-------------
PR: https://git.openjdk.java.net/jdk/pull/3042
More information about the hotspot-dev
mailing list