Official support for Unsafe
Quân Anh Mai
anhmdq at gmail.com
Mon Jan 15 17:05:02 UTC 2024
Sure, I just thought that looking at the instruction count would be more
helpful, since each machine would express different performance behaviours.
For example, my machine shows dependency bound going from [2] to [1] below,
which leads to a much smaller margin of execution time compared to the
margin measured by other machines (such as the test machine). The third
implementation is similar to the first one, except I use safe accesses in
the form of bounded memory segment accesses and varhandles.
The JMH numbers for these versions look like this, I define an execute
function which is:
@Benchmark
public PoorManMap execute() throws IOException {
try (var file = FileChannel.open(Path.of(FILE),
StandardOpenOption.READ);
var arena = Arena.ofShared()) {
var data = file.map(MapMode.READ_ONLY, 0, file.size(), arena);
return processFile(data, 0, data.byteSize());
}
}
CalculateAverage_merykitty.execute avgt 5 7.422 ± 0.093 ms/op
// unsafe [1]
CalculateAverage_merykitty.execute avgt 5 7.686 ± 0.181 ms/op
// universe segment [2]
CalculateAverage_merykitty.execute avgt 5 9.009 ± 0.058 ms/op
// varhandle [3]
[1]: https://github.com/merykitty/1brc/tree/main
[2]: https://github.com/merykitty/1brc/tree/removeunsafe
[3]: https://github.com/merykitty/1brc/tree/varhandles
Best regards,
Quan Anh
On Tue, 16 Jan 2024 at 00:29, Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:
>
> On 15/01/2024 15:44, Quân Anh Mai wrote:
>
> Running the same program on 1e6 lines results in only 9e9 instructions, so
> I think the vast majority of the instruction count is of the compiled code.
> Not using the universe segment is roughly equivalent to my previous
> version, which would result in around 50% more instructions compared to
> using one, and almost double the instruction count of using Unsafe.
>
> Without looking at the program some more, it's hard for me to make some
> sense of these numbers. I'm surprised that you don't see any difference
> when using unbounded segment compared to regular ones. I wonder if the gap
> you are seeing is due to the JVM warming up, rather than peak performances
> being worse. Have you tried measuring peak performance with e.g. JMH? I
> would not expect to see 20% difference there...
>
> Maurizio
>
>
> Regards,
> Quan Anh
>
> On Mon, 15 Jan 2024 at 23:09, Maurizio Cimadamore <
> maurizio.cimadamore at oracle.com> wrote:
>
>> I think the increased instruction count is normal, as C2 had to do more
>> work to optimize the bound checks away?
>>
>> Is there any difference compared to the version that doesn't use the
>> universe segment?
>>
>> Maurizio
>> On 15/01/2024 13:52, Quân Anh Mai wrote:
>>
>> Hi,
>>
>> I have tried using a universe segment instead of Unsafe, and store the
>> custom hashmap buffer in off-heap instead of using a byte array. The output
>> of perf stat on the program
>>
>> Performance counter stats for 'sh calculate_average_merykittyunsafe.sh':
>>
>> 13573.70 msec task-clock:u # 10.942 CPUs utilized
>> 0 context-switches:u # 0.000 /sec
>> 0 cpu-migrations:u # 0.000 /sec
>> 238460 page-faults:u # 17.568 K/sec
>> 61995179870 cycles:u # 4.567 GHz
>> 261830581 stalled-cycles-frontend:u # 0.42% frontend
>> cycles idle
>> 93823680 stalled-cycles-backend:u # 0.15% backend
>> cycles idle
>> 137976098809 instructions:u # 2.23 insn per
>> cycle
>> # 0.00 stalled
>> cycles per insn
>> 18373313803 branches:u # 1.354 G/sec
>> 43579782 branch-misses:u # 0.24% of all
>> branches
>>
>> 1.240504612 seconds time elapsed
>>
>> 12.841563000 seconds user
>> 0.652428000 seconds sys
>>
>> For comparison, this is the unsafe version:
>>
>> Performance counter stats for 'sh calculate_average_merykittyunsafe.sh':
>>
>> 13327.46 msec task-clock:u # 11.202 CPUs utilized
>> 0 context-switches:u # 0.000 /sec
>> 0 cpu-migrations:u # 0.000 /sec
>> 269896 page-faults:u # 20.251 K/sec
>> 61258348752 cycles:u # 4.596 GHz
>> 639839262 stalled-cycles-frontend:u # 1.04% frontend
>> cycles idle
>> 108018676 stalled-cycles-backend:u # 0.18% backend
>> cycles idle
>> 113476168983 instructions:u # 1.85 insn per
>> cycle
>> # 0.01 stalled
>> cycles per insn
>> 11442665370 branches:u # 858.578 M/sec
>> 44590172 branch-misses:u # 0.39% of all
>> branches
>>
>> 1.189768677 seconds time elapsed
>>
>> 12.628512000 seconds user
>> 0.620083000 seconds sys
>>
>> This program running on my machine expresses dependency bound so the
>> difference in execution time is not as significant as on the test machine
>> but it can be seen that removing Unsafe results in over 21% increase in
>> instruction count.
>>
>> Regards,
>> Quan Anh
>>
>> On Sat, 13 Jan 2024 at 01:29, Maurizio Cimadamore <
>> maurizio.cimadamore at oracle.com> wrote:
>>
>>>
>>> On 12/01/2024 17:26, Quân Anh Mai wrote:
>>> > FYI, in my submission to 1brc, using Unsafe decreases the execution
>>> > time from 3.25s to 2.57s on the test machine.
>>>
>>> Just curious - what is the difference compared with the everything
>>> segment trick?
>>>
>>> (While I know it can't do on-heap access, perhaps you can tweak the code
>>> to be all off-heap?)
>>>
>>> Maurizio
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20240116/0c640183/attachment.htm>
More information about the amber-dev
mailing list