benchmark unaligned memory access

Thu Mar 5 02:38:30 UTC 2015

Aleksey,
Thank you for your reply. I'm developing some off-heap serializing lib, and I'm wondering if we do not padding to aligning, it could save us a lot of memory. I guess memory prefetch might have impact on this, so I change the benchmark to iterate over a large memory with different step size; the result do so show obvious difference between aligned/unaligned access when step size increasing, but the cost of aligned access is steady and seems doesn't affected by the step size, which still confused me. Anyway, I'll turn to StackOverflow for this issue...
During running those benchmark with "-prof perf", sometimes I got following perf stats which I'm not sure if it's abnormal: most cache/tlb loads are 0. I'm not sure if this is related to JMH, just post here for information.
Perf stats:--------------------------------------------------
       9551.219858 task-clock (msec)         #    0.312 CPUs utilized                       1,465 context-switches          #    0.153 K/sec                                 350 cpu-migrations            #    0.037 K/sec                                  88 page-faults               #    0.009 K/sec                      20,005,142,254 cycles                    #    2.095 GHz                     [22.25%]    10,067,151,576 stalled-cycles-frontend   #   50.32% frontend cycles idle    [22.43%]       123,390,551 stalled-cycles-backend    #    0.62% backend  cycles idle    [22.53%]    37,104,168,166 instructions              #    1.85  insns per cycle                                                     #    0.27  stalled cycles per insn [28.10%]     7,152,762,320 branches                  #  748.885 M/sec                   [28.17%]           820,260 branch-misses             #    0.01% of all branches         [28.09%]                 0 L1-dcache-loads           #    0.000 K/sec                   [ 0.00%]4,375,650,713,294,279 L1-dcache-load-misses     #    0.00% of all L1-dcache hits   [ 0.00%]1,061,862,286,476,262 LLC-loads                 # 111175567.337 M/sec                   [ 0.00%]319,981,527,223,167 LLC-load-misses           #   30.13% of all LL-cache hits    [ 0.00%]                 0 L1-icache-loads           #    0.000 K/sec                   [ 0.00%]8,224,297,943,167,724 L1-icache-load-misses     #    0.00% of all L1-icache hits   [ 0.00%]                 0 dTLB-loads                #    0.000 K/sec                   [ 0.00%]207,075,866,954,057 dTLB-load-misses          #    0.00% of all dTLB cache hits  [ 0.00%]                 0 iTLB-loads                #    0.000 K/sec                   [ 0.00%]34,929,601,635,080 iTLB-load-misses          #    0.00% of all iTLB cache hits  [ 0.00%]135,034,208,792,378 L1-dcache-prefetches      # 14137901.839 M/sec                   [ 0.00%]173,581,104,967,126 L1-dcache-prefetch-misses # 18173710.536 M/sec                   [ 0.00%]
      30.588279372 seconds time elapsed


> Date: Wed, 4 Mar 2015 20:40:32 +0300
> From: aleksey.shipilev at oracle.com
> To: ychuang_cn at hotmail.com; jmh-dev at openjdk.java.net
> Subject: Re: benchmark unaligned memory access
> 
> Hi YC,
> 
> These questions belong to StackOverflow, please ask it there. This list
> is for JMH development, not benchmark reviews.
> 
> On 04.03.2015 13:24, YC Huang wrote:
> > Hi, I'm trying to benchmark what's the cost between aligned/unaligned
> > memory access,but my test-case doesn't show there's obvious
> > performance difference when running on my laptop(Pentium 6200). 
> 
> Did you consider the performance difference is not "obvious", if you
> can't replicate it easily? ;) All in all, you are trying to follow up on
> a very thin phenomena, and probably the infrastructure costs dominate
> the read performance.
> 
> > Am I doing something wrong in the Test? 
> 
> The test looks OK, except for Unsafe instance might reside in "static
> *final*" field.
> 
> -Aleksey.
>