Official support for Unsafe
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Mon Jan 15 17:39:22 UTC 2024
By the way, I don't exactly know what you mean when you say that your
machine shows/expresses dependency bound :-)
Maurizio
On 15/01/2024 17:05, Quân Anh Mai wrote:
> Sure, I just thought that looking at the instruction count would be
> more helpful, since each machine would express different performance
> behaviours. For example, my machine shows dependency bound going from
> [2] to [1] below, which leads to a much smaller margin of execution
> time compared to the margin measured by other machines (such as the
> test machine). The third implementation is similar to the first one,
> except I use safe accesses in the form of bounded memory segment
> accesses and varhandles.
>
> The JMH numbers for these versions look like this, I define an execute
> function which is:
>
> @Benchmark
> public PoorManMap execute() throws IOException {
> try (var file = FileChannel.open(Path.of(FILE),
> StandardOpenOption.READ);
> var arena = Arena.ofShared()) {
> var data = file.map(MapMode.READ_ONLY, 0, file.size(), arena);
> return processFile(data, 0, data.byteSize());
> }
> }
>
> CalculateAverage_merykitty.execute avgt 5 7.422 ± 0.093
> ms/op // unsafe [1]
> CalculateAverage_merykitty.execute avgt 5 7.686 ± 0.181
> ms/op // universe segment [2]
> CalculateAverage_merykitty.execute avgt 5 9.009 ± 0.058
> ms/op // varhandle [3]
>
> [1]: https://github.com/merykitty/1brc/tree/main
> <https://urldefense.com/v3/__https://github.com/merykitty/1brc/tree/main__;!!ACWV5N9M2RV99hQ!IUCdtouLGOCslnu12ztV0zav6VwnkUFY-SKEQjIpQqeFu1BcYMR23QSVWPOHlO9374x1qxH67yVJEBtQtnyAww$>
> [2]: https://github.com/merykitty/1brc/tree/removeunsafe
> <https://urldefense.com/v3/__https://github.com/merykitty/1brc/tree/removeunsafe__;!!ACWV5N9M2RV99hQ!IUCdtouLGOCslnu12ztV0zav6VwnkUFY-SKEQjIpQqeFu1BcYMR23QSVWPOHlO9374x1qxH67yVJEBv6-AEpwA$>
> [3]: https://github.com/merykitty/1brc/tree/varhandles
> <https://urldefense.com/v3/__https://github.com/merykitty/1brc/tree/varhandles__;!!ACWV5N9M2RV99hQ!IUCdtouLGOCslnu12ztV0zav6VwnkUFY-SKEQjIpQqeFu1BcYMR23QSVWPOHlO9374x1qxH67yVJEBtlf-1yaA$>
>
> Best regards,
> Quan Anh
>
> On Tue, 16 Jan 2024 at 00:29, Maurizio Cimadamore
> <maurizio.cimadamore at oracle.com> wrote:
>
>
> On 15/01/2024 15:44, Quân Anh Mai wrote:
>> Running the same program on 1e6 lines results in only 9e9
>> instructions, so I think the vast majority of the instruction
>> count is of the compiled code. Not using the universe segment is
>> roughly equivalent to my previous version, which would result in
>> around 50% more instructions compared to using one, and almost
>> double the instruction count of using Unsafe.
>
> Without looking at the program some more, it's hard for me to make
> some sense of these numbers. I'm surprised that you don't see any
> difference when using unbounded segment compared to regular ones.
> I wonder if the gap you are seeing is due to the JVM warming up,
> rather than peak performances being worse. Have you tried
> measuring peak performance with e.g. JMH? I would not expect to
> see 20% difference there...
>
> Maurizio
>
>>
>> Regards,
>> Quan Anh
>>
>> On Mon, 15 Jan 2024 at 23:09, Maurizio Cimadamore
>> <maurizio.cimadamore at oracle.com> wrote:
>>
>> I think the increased instruction count is normal, as C2 had
>> to do more work to optimize the bound checks away?
>>
>> Is there any difference compared to the version that doesn't
>> use the universe segment?
>>
>> Maurizio
>>
>> On 15/01/2024 13:52, Quân Anh Mai wrote:
>>> Hi,
>>>
>>> I have tried using a universe segment instead of Unsafe, and
>>> store the custom hashmap buffer in off-heap instead of using
>>> a byte array. The output of perf stat on the program
>>>
>>> Performance counter stats for 'sh
>>> calculate_average_merykittyunsafe.sh':
>>>
>>> 13573.70 msec task-clock:u # 10.942 CPUs utilized
>>> 0 context-switches:u # 0.000 /sec
>>> 0 cpu-migrations:u # 0.000 /sec
>>> 238460 page-faults:u # 17.568 K/sec
>>> 61995179870 cycles:u # 4.567 GHz
>>> 261830581 stalled-cycles-frontend:u # 0.42%
>>> frontend cycles idle
>>> 93823680 stalled-cycles-backend:u # 0.15%
>>> backend cycles idle
>>> 137976098809 instructions:u # 2.23 insn per
>>> cycle
>>> # 0.00 stalled cycles per insn
>>> 18373313803 branches:u # 1.354 G/sec
>>> 43579782 branch-misses:u # 0.24% of all
>>> branches
>>>
>>> 1.240504612 seconds time elapsed
>>>
>>> 12.841563000 seconds user
>>> 0.652428000 seconds sys
>>>
>>> For comparison, this is the unsafe version:
>>>
>>> Performance counter stats for 'sh
>>> calculate_average_merykittyunsafe.sh':
>>>
>>> 13327.46 msec task-clock:u # 11.202 CPUs
>>> utilized
>>> 0 context-switches:u # 0.000 /sec
>>> 0 cpu-migrations:u # 0.000 /sec
>>> 269896 page-faults:u # 20.251 K/sec
>>> 61258348752 cycles:u # 4.596 GHz
>>> 639839262 stalled-cycles-frontend:u # 1.04%
>>> frontend cycles idle
>>> 108018676 stalled-cycles-backend:u # 0.18%
>>> backend cycles idle
>>> 113476168983 instructions:u # 1.85 insn
>>> per cycle
>>> # 0.01 stalled cycles per insn
>>> 11442665370 branches:u # 858.578 M/sec
>>> 44590172 branch-misses:u # 0.39% of
>>> all branches
>>>
>>> 1.189768677 seconds time elapsed
>>>
>>> 12.628512000 seconds user
>>> 0.620083000 seconds sys
>>>
>>> This program running on my machine expresses dependency
>>> bound so the difference in execution time is not as
>>> significant as on the test machine but it can be seen that
>>> removing Unsafe results in over 21% increase in instruction
>>> count.
>>>
>>> Regards,
>>> Quan Anh
>>>
>>> On Sat, 13 Jan 2024 at 01:29, Maurizio Cimadamore
>>> <maurizio.cimadamore at oracle.com> wrote:
>>>
>>>
>>> On 12/01/2024 17:26, Quân Anh Mai wrote:
>>> > FYI, in my submission to 1brc, using Unsafe decreases
>>> the execution
>>> > time from 3.25s to 2.57s on the test machine.
>>>
>>> Just curious - what is the difference compared with the
>>> everything
>>> segment trick?
>>>
>>> (While I know it can't do on-heap access, perhaps you
>>> can tweak the code
>>> to be all off-heap?)
>>>
>>> Maurizio
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20240115/6bbecea6/attachment-0001.htm>
More information about the amber-dev
mailing list