RFR: 8263718: unused-result warning happens at os_linux.cpp

Yasumasa Suenaga ysuenaga at openjdk.java.net
Mon Mar 29 08:13:26 UTC 2021


On Sun, 21 Mar 2021 06:40:18 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:

>> @magicus I checked assembly code of release build, it is similar to fastdebug build. The stack (RSP) is expanded.
>> (I confirmed it with Visual Studio 16.9.2 because I received update notification before your reply...)
>>   // Try to randomize the cache line index of hot stack frames.
>>   // This helps when threads of the same stack traces evict each other's
>>   // cache lines. The threads can be either from the same JVM instance, or
>>   // from different JVM instances. The benefit is especially true for
>>   // processors with hyperthreading technology.
>>   static int counter = 0;
>>   int pid = os::current_process_id();
>> 00007FF80F4E10ED  mov         eax,dword ptr [_initial_pid (07FF80F9D8164h)]  
>> 00007FF80F4E10F3  test        eax,eax  
>> 00007FF80F4E10F5  jne         thread_native_entry+3Dh (07FF80F4E10FDh)  
>> 00007FF80F4E10F7  call        qword ptr [__imp__getpid (07FF80F6EF748h)]  
>>   _alloca(((pid ^ counter++) & 7) * 128);
>> 00007FF80F4E10FD  mov         ecx,dword ptr [counter (07FF80F9D8300h)]  
>> 00007FF80F4E1103  mov         edx,ecx  
>> 00007FF80F4E1105  xor         edx,eax  
>> 00007FF80F4E1107  inc         ecx  
>> 00007FF80F4E1109  mov         dword ptr [counter (07FF80F9D8300h)],ecx  
>> 00007FF80F4E110F  and         edx,7  
>> 00007FF80F4E1112  shl         edx,7  
>> 00007FF80F4E1115  mov         eax,edx  
>> 00007FF80F4E1117  lea         rcx,[rdx+0Fh]  
>> 00007FF80F4E111B  cmp         rcx,rax  
>> 00007FF80F4E111E  ja          thread_native_entry+6Ah (07FF80F4E112Ah)  
>> 00007FF80F4E1120  mov         rcx,0FFFFFFFFFFFFFF0h  
>> 00007FF80F4E112A  and         rcx,0FFFFFFFFFFFFFFF0h  
>> 00007FF80F4E112E  mov         rax,rcx  
>> 00007FF80F4E1131  call        __chkstk (07FF80F6ED370h)  
>> 00007FF80F4E1136  sub         rsp,rcx
>
>> Did you measure on Alpine too, with muslc? And the XXXBsds? Are we sure we
>> measure the right thing? I wish there were regression tests telling us when
>> to re-apply this optimization.
> 
> I think we can decide by whether `alloca()` or equivalent stack operation is contained in current binary.
> If current binary does not contain it like JDK 17 Linux x64, we can remove it because it already does not work - performance degradation will not happen.
> 
> In its context, I think Alpine + musl libc is also ok to remove it because I confirmed JDK 17 for Alpine x64 (from jdk.java.net) does not contain stack operation same as x64.
> 
> 0000000000b889b0 <thread_native_entry(Thread*)>:
>   b889b0:       55                      push   %rbp
>   b889b1:       48 89 e5                mov    %rsp,%rbp
>   b889b4:       41 56                   push   %r14
>   b889b6:       41 55                   push   %r13
>   b889b8:       49 89 fd                mov    %rdi,%r13
>   b889bb:       41 54                   push   %r12
>   b889bd:       53                      push   %rbx
>   b889be:       e8 6d a5 1a 00          callq  d32f30 <Thread::record_stack_base_and_size()>
>   b889c3:       e8 e8 fb 6a ff          callq  2385b0 <getpid at plt>
>   b889c8:       4c 89 ef                mov    %r13,%rdi
>   b889cb:       83 05 b6 98 61 00 01    addl   $0x1,0x6198b6(%rip)        # 11a2288 <thread_native_entry(Thread*)::counter>
>   b889d2:       e8 f9 a4 1a 00          callq  d32ed0 <Thread::initialize_thread_current()>
>   b889d7:       49 8b 9d 68 02 00 00    mov    0x268(%r13),%rbx
>   b889de:       31 c0                   xor    %eax,%eax
> 
>> (Please leave the alloca in the AIX implementation; we currently don't have
>> the cycles to run regression tests for this)
> 
> Ok, I got it.

My colleague confirmed that `alloca()` (or equivalent stack operation) does not exist in JDK 16 from jdk.java.net:

(lldb) disass
libjvm.dylib`thread_native_entry:
->  0x103892030 <+0>:   pushq  %rbp
    0x103892031 <+1>:   movq   %rsp, %rbp
    0x103892034 <+4>:   pushq  %r15
    0x103892036 <+6>:   pushq  %r14
    0x103892038 <+8>:   pushq  %r13
    0x10389203a <+10>:  pushq  %r12
    0x10389203c <+12>:  pushq  %rbx
    0x10389203d <+13>:  pushq  %rax
    0x10389203e <+14>:  movq   %rdi, %r14
    0x103892041 <+17>:  callq  0x103a03690               ; Thread::record_stack_base_and_size()
    0x103892046 <+22>:  callq  0x103aed89a               ; symbol stub for: getpid
    0x10389204b <+27>:  incl   0x452203(%rip)            ; thread_native_entry(Thread*)::counter
    0x103892051 <+33>:  movq   %r14, %rdi
    0x103892054 <+36>:  callq  0x103a03650               ; Thread::initialize_thread_current()
    0x103892059 <+41>:  movq   0x270(%r14), %rbx
    0x103892060 <+48>:  movq   0x50(%rbx), %r15

So I think we can remove `alloca()` from os_bsd, os_windows safely. In Linux, David confirms it did not show any degradation in performance, however we cannot evaluete platforms other than glibc. So I left `alloca()` call if it does not run on glibc, and I modified to affect it.
Could you review again?
(At request from Thomas, I've not changed os_aix.cpp in this PR.)

-------------

PR: https://git.openjdk.java.net/jdk/pull/3042


More information about the hotspot-dev mailing list