RFR: 8263718: unused-result warning happens at os_linux.cpp

Sun Mar 21 13:30:15 UTC 2021

On 21/03/2021 5:22 pm, Thomas Stüfe wrote:
> On Sun, Mar 21, 2021 at 7:43 AM Yasumasa Suenaga <ysuenaga at openjdk.java.net>
> wrote:
> 
>> On Sat, 20 Mar 2021 12:35:18 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org>
>> wrote:
>>
>>>> @YaSuenag The code generated by debug builds is often significantly
>> different from release builds. You might want to check the latter instead
>> to figure out if this has any effect on what we actually ship.
>>>
>>> @magicus I checked assembly code of release build, it is similar to
>> fastdebug build. The stack (RSP) is expanded.
>>> (I confirmed it with Visual Studio 16.9.2 because I received update
>> notification before your reply...)
>>>    // Try to randomize the cache line index of hot stack frames.
>>>    // This helps when threads of the same stack traces evict each other's
>>>    // cache lines. The threads can be either from the same JVM instance,
>> or
>>>    // from different JVM instances. The benefit is especially true for
>>>    // processors with hyperthreading technology.
>>>    static int counter = 0;
>>>    int pid = os::current_process_id();
>>> 00007FF80F4E10ED  mov         eax,dword ptr [_initial_pid
>> (07FF80F9D8164h)]
>>> 00007FF80F4E10F3  test        eax,eax
>>> 00007FF80F4E10F5  jne         thread_native_entry+3Dh (07FF80F4E10FDh)
>>> 00007FF80F4E10F7  call        qword ptr [__imp__getpid
>> (07FF80F6EF748h)]
>>>    _alloca(((pid ^ counter++) & 7) * 128);
>>> 00007FF80F4E10FD  mov         ecx,dword ptr [counter (07FF80F9D8300h)]
>>> 00007FF80F4E1103  mov         edx,ecx
>>> 00007FF80F4E1105  xor         edx,eax
>>> 00007FF80F4E1107  inc         ecx
>>> 00007FF80F4E1109  mov         dword ptr [counter (07FF80F9D8300h)],ecx
>>> 00007FF80F4E110F  and         edx,7
>>> 00007FF80F4E1112  shl         edx,7
>>> 00007FF80F4E1115  mov         eax,edx
>>> 00007FF80F4E1117  lea         rcx,[rdx+0Fh]
>>> 00007FF80F4E111B  cmp         rcx,rax
>>> 00007FF80F4E111E  ja          thread_native_entry+6Ah (07FF80F4E112Ah)
>>> 00007FF80F4E1120  mov         rcx,0FFFFFFFFFFFFFF0h
>>> 00007FF80F4E112A  and         rcx,0FFFFFFFFFFFFFFF0h
>>> 00007FF80F4E112E  mov         rax,rcx
>>> 00007FF80F4E1131  call        __chkstk (07FF80F6ED370h)
>>> 00007FF80F4E1136  sub         rsp,rcx
>>
>>> Did you measure on Alpine too, with muslc? And the XXXBsds? Are we sure
>> we
>>> measure the right thing? I wish there were regression tests telling us
>> when
>>> to re-apply this optimization.
>>
>> I think we can decide by whether `alloca()` or equivalent stack operation
>> is contained in current binary.
>> If current binary does not contain it like JDK 17 Linux x64, we can remove
>> it because it already does not work - performance degradation will not
>> happen.
>>
>>
> How do you know this? We may have had a performance degradation since years
> without noticing.  The question is, if a performance optimization,
> considered necessary in the past, did bitrot, should we remove or repair it.
> 
> In its context, I think Alpine + musl libc is also ok to remove it because
>> I confirmed JDK 17 for Alpine x64 (from jdk.java.net) does not contain
>> stack operation same as x64.
>>
>> 0000000000b889b0 <thread_native_entry(Thread*)>:
>>    b889b0:       55                      push   %rbp
>>    b889b1:       48 89 e5                mov    %rsp,%rbp
>>    b889b4:       41 56                   push   %r14
>>    b889b6:       41 55                   push   %r13
>>    b889b8:       49 89 fd                mov    %rdi,%r13
>>    b889bb:       41 54                   push   %r12
>>    b889bd:       53                      push   %rbx
>>    b889be:       e8 6d a5 1a 00          callq  d32f30
>> <Thread::record_stack_base_and_size()>
>>    b889c3:       e8 e8 fb 6a ff          callq  2385b0 <getpid at plt>
>>    b889c8:       4c 89 ef                mov    %r13,%rdi
>>    b889cb:       83 05 b6 98 61 00 01    addl   $0x1,0x6198b6(%rip)
>> # 11a2288 <thread_native_entry(Thread*)::counter>
>>    b889d2:       e8 f9 a4 1a 00          callq  d32ed0
>> <Thread::initialize_thread_current()>
>>    b889d7:       49 8b 9d 68 02 00 00    mov    0x268(%r13),%rbx
>>    b889de:       31 c0                   xor    %eax,%eax
>>
>>
> which does not prove it is not beneficial, it just proves it is not there.

Right. It may be that it is of no benefit on glibc because glibc already 
does something clever here. But musl may not, in which case repairing 
the alloca call may improve performance on musl.

This is frustrating because on the one hand there is no point fixing a 
warning for something the compiler elides anyway; while on the other we 
can't even know if this quirky optimisation would be of benefit under 
some conditions.

And we don't know how to try and measure it regardless. :(

David
-----

> 
>>> (Please leave the alloca in the AIX implementation; we currently don't
>> have
>>> the cycles to run regression tests for this)
>>
>> Ok, I got it.
>>
>>
> Cheers, Thomas
> 
> 
>> -------------
>>
>> PR: https://git.openjdk.java.net/jdk/pull/3042
>>