Official support for Unsafe

Mon Jan 15 17:05:02 UTC 2024

Sure, I just thought that looking at the instruction count would be more
helpful, since each machine would express different performance behaviours.
For example, my machine shows dependency bound going from [2] to [1] below,
which leads to a much smaller margin of execution time compared to the
margin measured by other machines (such as the test machine). The third
implementation is similar to the first one, except I use safe accesses in
the form of bounded memory segment accesses and varhandles.

The JMH numbers for these versions look like this, I define an execute
function which is:

    @Benchmark
    public PoorManMap execute() throws IOException {
        try (var file = FileChannel.open(Path.of(FILE),
StandardOpenOption.READ);
             var arena = Arena.ofShared()) {
            var data = file.map(MapMode.READ_ONLY, 0, file.size(), arena);
            return processFile(data, 0, data.byteSize());
        }
    }

    CalculateAverage_merykitty.execute      avgt    5  7.422 ± 0.093  ms/op
// unsafe [1]
    CalculateAverage_merykitty.execute      avgt    5  7.686 ± 0.181  ms/op
// universe segment [2]
    CalculateAverage_merykitty.execute      avgt    5  9.009 ± 0.058  ms/op
// varhandle [3]

[1]: https://github.com/merykitty/1brc/tree/main
[2]: https://github.com/merykitty/1brc/tree/removeunsafe
[3]: https://github.com/merykitty/1brc/tree/varhandles

Best regards,
Quan Anh

On Tue, 16 Jan 2024 at 00:29, Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:

>
> On 15/01/2024 15:44, Quân Anh Mai wrote:
>
> Running the same program on 1e6 lines results in only 9e9 instructions, so
> I think the vast majority of the instruction count is of the compiled code.
> Not using the universe segment is roughly equivalent to my previous
> version, which would result in around 50% more instructions compared to
> using one, and almost double the instruction count of using Unsafe.
>
> Without looking at the program some more, it's hard for me to make some
> sense of these numbers. I'm surprised that you don't see any difference
> when using unbounded segment compared to regular ones. I wonder if the gap
> you are seeing is due to the JVM warming up, rather than peak performances
> being worse. Have you tried measuring peak performance with e.g. JMH? I
> would not expect to see 20% difference there...
>
> Maurizio
>
>
> Regards,
> Quan Anh
>
> On Mon, 15 Jan 2024 at 23:09, Maurizio Cimadamore <
> maurizio.cimadamore at oracle.com> wrote:
>
>> I think the increased instruction count is normal, as C2 had to do more
>> work to optimize the bound checks away?
>>
>> Is there any difference compared to the version that doesn't use the
>> universe segment?
>>
>> Maurizio
>> On 15/01/2024 13:52, Quân Anh Mai wrote:
>>
>> Hi,
>>
>> I have tried using a universe segment instead of Unsafe, and store the
>> custom hashmap buffer in off-heap instead of using a byte array. The output
>> of perf stat on the program
>>
>>  Performance counter stats for 'sh calculate_average_merykittyunsafe.sh':
>>
>>           13573.70 msec task-clock:u              #   10.942 CPUs utilized
>>                  0      context-switches:u        #    0.000 /sec
>>                  0      cpu-migrations:u          #    0.000 /sec
>>             238460      page-faults:u             #   17.568 K/sec
>>        61995179870      cycles:u                  #    4.567 GHz
>>          261830581      stalled-cycles-frontend:u #    0.42% frontend
>> cycles idle
>>           93823680      stalled-cycles-backend:u  #    0.15% backend
>> cycles idle
>>       137976098809      instructions:u            #    2.23  insn per
>> cycle
>>                                                   #    0.00  stalled
>> cycles per insn
>>        18373313803      branches:u                #    1.354 G/sec
>>           43579782      branch-misses:u           #    0.24% of all
>> branches
>>
>>        1.240504612 seconds time elapsed
>>
>>       12.841563000 seconds user
>>        0.652428000 seconds sys
>>
>> For comparison, this is the unsafe version:
>>
>>  Performance counter stats for 'sh calculate_average_merykittyunsafe.sh':
>>
>>           13327.46 msec task-clock:u              #   11.202 CPUs utilized
>>                  0      context-switches:u        #    0.000 /sec
>>                  0      cpu-migrations:u          #    0.000 /sec
>>             269896      page-faults:u             #   20.251 K/sec
>>        61258348752      cycles:u                  #    4.596 GHz
>>          639839262      stalled-cycles-frontend:u #    1.04% frontend
>> cycles idle
>>          108018676      stalled-cycles-backend:u  #    0.18% backend
>> cycles idle
>>       113476168983      instructions:u            #    1.85  insn per
>> cycle
>>                                                   #    0.01  stalled
>> cycles per insn
>>        11442665370      branches:u                #  858.578 M/sec
>>           44590172      branch-misses:u           #    0.39% of all
>> branches
>>
>>        1.189768677 seconds time elapsed
>>
>>       12.628512000 seconds user
>>        0.620083000 seconds sys
>>
>> This program running on my machine expresses dependency bound so the
>> difference in execution time is not as significant as on the test machine
>> but it can be seen that removing Unsafe results in over 21% increase in
>> instruction count.
>>
>> Regards,
>> Quan Anh
>>
>> On Sat, 13 Jan 2024 at 01:29, Maurizio Cimadamore <
>> maurizio.cimadamore at oracle.com> wrote:
>>
>>>
>>> On 12/01/2024 17:26, Quân Anh Mai wrote:
>>> > FYI, in my submission to 1brc, using Unsafe decreases the execution
>>> > time from 3.25s to 2.57s on the test machine.
>>>
>>> Just curious - what is the difference compared with the everything
>>> segment trick?
>>>
>>> (While I know it can't do on-heap access, perhaps you can tweak the code
>>> to be all off-heap?)
>>>
>>> Maurizio
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20240116/0c640183/attachment.htm>