RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options

Hao Sun haosun at openjdk.org
Wed Oct 12 02:04:45 UTC 2022


On Tue, 11 Oct 2022 09:28:59 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> In this patch, we implement functions histogram_bytecode() and histogram_bytecode_pair() for interpreter AArch64 part. Similar to count_bytecode(), we use atomic operations to update the counters as well.
>> 
>> Here shows part of the message produced with -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options after this patch.
>> 
>> 
>> $ java -XX:+PrintBytecodeHistogram --version | head -20
>> openjdk 20-internal 2023-03-21
>> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev)
>> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode)
>> 
>> Histogram of 5004099 executed bytecodes:
>> 
>>   absolute  relative  code    name
>> ----------------------------------------------------------------------
>>     319124     6.38%    dc    fast_aload_0
>>     313397     6.26%    e0    fast_iload
>>     251436     5.02%    b6    invokevirtual
>>     227428     4.54%    19    aload
>>     166054     3.32%    a7    goto
>>     159167     3.18%    2b    aload_1
>>     151803     3.03%    de    fast_aaccess_0
>>     136787     2.73%    1b    iload_1
>>     124037     2.48%    36    istore
>>     118791     2.37%    84    iinc
>>     118121     2.36%    1c    iload_2
>>     110484     2.21%    a2    if_icmpge
>> 
>> $ java -XX:+PrintBytecodePairHistogram --version | head -20
>> openjdk 20-internal 2023-03-21
>> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev)
>> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode)
>> 
>> Histogram of 4804441 executed bytecode pairs:
>> 
>>   absolute  relative    codes    1st bytecode        2nd bytecode
>> ----------------------------------------------------------------------
>>      77602    1.615%    84 a7    iinc                goto
>>      49749    1.035%    36 e0    istore              fast_iload
>>      48931    1.018%    e0 10    fast_iload          bipush
>>      46294    0.964%    e0 b6    fast_iload          invokevirtual
>>      42661    0.888%    a7 e0    goto                fast_iload
>>      42243    0.879%    3a 19    astore              aload
>>      40138    0.835%    19 b9    aload               invokeinterface
>>      36617    0.762%    dc 2b    fast_aload_0        aload_1
>>      35745    0.744%    b7 dc    invokespecial       fast_aload_0
>>      35384    0.736%    19 b6    aload               invokevirtual
>>      35035    0.729%    b6 de    invokevirtual       fast_aaccess_0
>>      34667    0.722%    dc b6    fast_aload_0        invokevirtual
>> 
>> 
>> In order to verfiy the correctness, I took the trace information produced by -XX:+TraceBytecodes as a cross reference. The hit times for some bytecodes/bytecode pairs can be obtained via parsing the trace. Then I compared the hit times with the corresponding "absolute" columns. I randomly selected several bytecodes/bytecode pairs, and the manual comparion results showed that "absolute" columns are correct.
>> 
>> Note-1: count_bytecode() is updated. 1) caller-saved registers are used as temporary registers and there is no need to save/restore them. 2) atomic_addw() should be used since the counter is of int type.
>> 
>> Note-2: As shown by the update in file templateInterpreterGenerator.cpp, function histogram_bytecode() should be invoked only inside !PRODUCT scope.
>
> src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1981:
> 
>> 1979: 
>> 1980: void TemplateInterpreterGenerator::count_bytecode() {
>> 1981:   Register rscratch3 = r10;
> 
> Please pass the scratch register to use as an argument to `TemplateInterpreterGenerator::generate_trace_code`

Thanks for your review. But I'm afraid I didn't fully understand it.

Why `generate_trace_code` is involved? I guess you mean `count_bytecode()`?
But `count_bytecode()` is invoked [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L362), and I don't think it's a proper site to pass arch-specific register `r10` to the general `count_bytecode()`.

Please correct me if I misunderstood. Thanks.

> src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 2008:
> 
>> 2006:           BytecodePairHistogram::log2_number_of_codes);
>> 2007:   __ stxrw(rscratch2, index, index_addr);
>> 2008:   __ cbnzw(rscratch2, L);  // retry to load _index
> 
> Please add `atomic_ldorrw` to the list of `ATOMIC_OP`s (in macroAssembler_aarch64.cpp) and use it here.

Agree. Will update.

-------------

PR: https://git.openjdk.org/jdk/pull/10642


More information about the hotspot-dev mailing list