Official support for Unsafe

Mon Jan 15 17:39:22 UTC 2024

By the way, I don't exactly know what you mean when you say that your 
machine shows/expresses dependency bound :-)

Maurizio

On 15/01/2024 17:05, Quân Anh Mai wrote:
> Sure, I just thought that looking at the instruction count would be 
> more helpful, since each machine would express different performance 
> behaviours. For example, my machine shows dependency bound going from 
> [2] to [1] below, which leads to a much smaller margin of execution 
> time compared to the margin measured by other machines (such as the 
> test machine). The third implementation is similar to the first one, 
> except I use safe accesses in the form of bounded memory segment 
> accesses and varhandles.
>
> The JMH numbers for these versions look like this, I define an execute 
> function which is:
>
>     @Benchmark
>     public PoorManMap execute() throws IOException {
>         try (var file = FileChannel.open(Path.of(FILE), 
> StandardOpenOption.READ);
>              var arena = Arena.ofShared()) {
>             var data = file.map(MapMode.READ_ONLY, 0, file.size(), arena);
>             return processFile(data, 0, data.byteSize());
>         }
>     }
>
>     CalculateAverage_merykitty.execute      avgt    5  7.422 ± 0.093 
>  ms/op // unsafe [1]
>     CalculateAverage_merykitty.execute      avgt    5  7.686 ± 0.181 
>  ms/op // universe segment [2]
>     CalculateAverage_merykitty.execute      avgt    5  9.009 ± 0.058 
>  ms/op // varhandle [3]
>
> [1]: https://github.com/merykitty/1brc/tree/main 
> <https://urldefense.com/v3/__https://github.com/merykitty/1brc/tree/main__;!!ACWV5N9M2RV99hQ!IUCdtouLGOCslnu12ztV0zav6VwnkUFY-SKEQjIpQqeFu1BcYMR23QSVWPOHlO9374x1qxH67yVJEBtQtnyAww$>
> [2]: https://github.com/merykitty/1brc/tree/removeunsafe 
> <https://urldefense.com/v3/__https://github.com/merykitty/1brc/tree/removeunsafe__;!!ACWV5N9M2RV99hQ!IUCdtouLGOCslnu12ztV0zav6VwnkUFY-SKEQjIpQqeFu1BcYMR23QSVWPOHlO9374x1qxH67yVJEBv6-AEpwA$>
> [3]: https://github.com/merykitty/1brc/tree/varhandles 
> <https://urldefense.com/v3/__https://github.com/merykitty/1brc/tree/varhandles__;!!ACWV5N9M2RV99hQ!IUCdtouLGOCslnu12ztV0zav6VwnkUFY-SKEQjIpQqeFu1BcYMR23QSVWPOHlO9374x1qxH67yVJEBtlf-1yaA$>
>
> Best regards,
> Quan Anh
>
> On Tue, 16 Jan 2024 at 00:29, Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com> wrote:
>
>
>     On 15/01/2024 15:44, Quân Anh Mai wrote:
>>     Running the same program on 1e6 lines results in only 9e9
>>     instructions, so I think the vast majority of the instruction
>>     count is of the compiled code. Not using the universe segment is
>>     roughly equivalent to my previous version, which would result in
>>     around 50% more instructions compared to using one, and almost
>>     double the instruction count of using Unsafe.
>
>     Without looking at the program some more, it's hard for me to make
>     some sense of these numbers. I'm surprised that you don't see any
>     difference when using unbounded segment compared to regular ones.
>     I wonder if the gap you are seeing is due to the JVM warming up,
>     rather than peak performances being worse. Have you tried
>     measuring peak performance with e.g. JMH? I would not expect to
>     see 20% difference there...
>
>     Maurizio
>
>>
>>     Regards,
>>     Quan Anh
>>
>>     On Mon, 15 Jan 2024 at 23:09, Maurizio Cimadamore
>>     <maurizio.cimadamore at oracle.com> wrote:
>>
>>         I think the increased instruction count is normal, as C2 had
>>         to do more work to optimize the bound checks away?
>>
>>         Is there any difference compared to the version that doesn't
>>         use the universe segment?
>>
>>         Maurizio
>>
>>         On 15/01/2024 13:52, Quân Anh Mai wrote:
>>>         Hi,
>>>
>>>         I have tried using a universe segment instead of Unsafe, and
>>>         store the custom hashmap buffer in off-heap instead of using
>>>         a byte array. The output of perf stat on the program
>>>
>>>          Performance counter stats for 'sh
>>>         calculate_average_merykittyunsafe.sh':
>>>
>>>                   13573.70 msec task-clock:u  #   10.942 CPUs utilized
>>>                          0      context-switches:u  #    0.000 /sec
>>>                          0      cpu-migrations:u  #    0.000 /sec
>>>                     238460      page-faults:u   #   17.568 K/sec
>>>                61995179870      cycles:u  #    4.567 GHz
>>>                  261830581  stalled-cycles-frontend:u #    0.42%
>>>         frontend cycles idle
>>>                   93823680      stalled-cycles-backend:u  #    0.15%
>>>         backend cycles idle
>>>               137976098809      instructions:u  #    2.23  insn per
>>>         cycle
>>>           #    0.00  stalled cycles per insn
>>>                18373313803      branches:u  #    1.354 G/sec
>>>                   43579782      branch-misses:u   #    0.24% of all
>>>         branches
>>>
>>>                1.240504612 seconds time elapsed
>>>
>>>               12.841563000 seconds user
>>>                0.652428000 seconds sys
>>>
>>>         For comparison, this is the unsafe version:
>>>
>>>          Performance counter stats for 'sh
>>>         calculate_average_merykittyunsafe.sh':
>>>
>>>                   13327.46 msec task-clock:u      #   11.202 CPUs
>>>         utilized
>>>                          0      context-switches:u      #    0.000 /sec
>>>                          0      cpu-migrations:u      #    0.000 /sec
>>>                     269896      page-faults:u       #   20.251 K/sec
>>>                61258348752      cycles:u      #    4.596 GHz
>>>                  639839262  stalled-cycles-frontend:u #    1.04%
>>>         frontend cycles idle
>>>                  108018676  stalled-cycles-backend:u  #    0.18%
>>>         backend cycles idle
>>>               113476168983      instructions:u      #    1.85  insn
>>>         per cycle
>>>               #    0.01  stalled cycles per insn
>>>                11442665370      branches:u      #  858.578 M/sec
>>>                   44590172      branch-misses:u       #    0.39% of
>>>         all branches
>>>
>>>                1.189768677 seconds time elapsed
>>>
>>>               12.628512000 seconds user
>>>                0.620083000 seconds sys
>>>
>>>         This program running on my machine expresses dependency
>>>         bound so the difference in execution time is not as
>>>         significant as on the test machine but it can be seen that
>>>         removing Unsafe results in over 21% increase in instruction
>>>         count.
>>>
>>>         Regards,
>>>         Quan Anh
>>>
>>>         On Sat, 13 Jan 2024 at 01:29, Maurizio Cimadamore
>>>         <maurizio.cimadamore at oracle.com> wrote:
>>>
>>>
>>>             On 12/01/2024 17:26, Quân Anh Mai wrote:
>>>             > FYI, in my submission to 1brc, using Unsafe decreases
>>>             the execution
>>>             > time from 3.25s to 2.57s on the test machine.
>>>
>>>             Just curious - what is the difference compared with the
>>>             everything
>>>             segment trick?
>>>
>>>             (While I know it can't do on-heap access, perhaps you
>>>             can tweak the code
>>>             to be all off-heap?)
>>>
>>>             Maurizio
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20240115/6bbecea6/attachment-0001.htm>