<div dir="ltr">Running the same program on 1e6 lines results in only 9e9 instructions, so I think the vast majority of the instruction count is of the compiled code. Not using the universe segment is roughly equivalent to my previous version, which would result in around 50% more instructions compared to using one, and almost double the instruction count of using Unsafe.<div><br></div><div>Regards,</div><div>Quan Anh</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 15 Jan 2024 at 23:09, Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com">maurizio.cimadamore@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>
<div>
<p>I think the increased instruction count is normal, as C2 had to
do more work to optimize the bound checks away?</p>
<p>Is there any difference compared to the version that doesn't use
the universe segment?</p>
<p>Maurizio<br>
</p>
<div>On 15/01/2024 13:52, Quân Anh Mai
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi,</div>
<div><br>
</div>
<div>I have tried using a universe segment instead of Unsafe,
and store the custom hashmap buffer in off-heap instead of
using a byte array. The output of perf stat on the program</div>
<div><br>
</div>
Performance counter stats for 'sh
calculate_average_merykittyunsafe.sh':<br>
<br>
13573.70 msec task-clock:u # 10.942
CPUs utilized<br>
0 context-switches:u # 0.000
/sec<br>
0 cpu-migrations:u # 0.000
/sec<br>
238460 page-faults:u # 17.568
K/sec<br>
61995179870 cycles:u # 4.567 GHz<br>
261830581 stalled-cycles-frontend:u # 0.42%
frontend cycles idle<br>
93823680 stalled-cycles-backend:u # 0.15%
backend cycles idle<br>
137976098809 instructions:u # 2.23
insn per cycle<br>
# 0.00
stalled cycles per insn<br>
18373313803 branches:u # 1.354
G/sec<br>
43579782 branch-misses:u # 0.24% of
all branches<br>
<br>
1.240504612 seconds time elapsed<br>
<br>
12.841563000 seconds user<br>
0.652428000 seconds sys
<div><br>
</div>
<div>For comparison, this is the unsafe version:<br>
<div><br>
</div>
<div> Performance counter stats for 'sh
calculate_average_merykittyunsafe.sh':<br>
<br>
13327.46 msec task-clock:u # 11.202
CPUs utilized<br>
0 context-switches:u # 0.000
/sec<br>
0 cpu-migrations:u # 0.000
/sec<br>
269896 page-faults:u # 20.251
K/sec<br>
61258348752 cycles:u # 4.596
GHz<br>
639839262 stalled-cycles-frontend:u # 1.04%
frontend cycles idle<br>
108018676 stalled-cycles-backend:u # 0.18%
backend cycles idle<br>
113476168983 instructions:u # 1.85
insn per cycle<br>
# 0.01
stalled cycles per insn<br>
11442665370 branches:u # 858.578
M/sec<br>
44590172 branch-misses:u # 0.39%
of all branches<br>
<br>
1.189768677 seconds time elapsed<br>
<br>
12.628512000 seconds user<br>
0.620083000 seconds sys<br>
</div>
</div>
<div><br>
</div>
<div>This program running on my machine expresses dependency
bound so the difference in execution time is not as
significant as on the test machine but it can be seen that
removing Unsafe results in over 21% increase in instruction
count.</div>
<div><br>
</div>
<div>Regards,</div>
<div>Quan Anh</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sat, 13 Jan 2024 at 01:29,
Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank">maurizio.cimadamore@oracle.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
On 12/01/2024 17:26, Quân Anh Mai wrote:<br>
> FYI, in my submission to 1brc, using Unsafe decreases the
execution <br>
> time from 3.25s to 2.57s on the test machine.<br>
<br>
Just curious - what is the difference compared with the
everything <br>
segment trick?<br>
<br>
(While I know it can't do on-heap access, perhaps you can
tweak the code <br>
to be all off-heap?)<br>
<br>
Maurizio<br>
<br>
</blockquote>
</div>
</blockquote>
</div>
</blockquote></div>