Unsafe vs MemorySegments / Bounds checking...

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Tue Oct 29 17:44:45 UTC 2024


Unfortunately we have never been able to come up with a reproducer for 
the slow down you are experiencing.

If/when you have a standalone benchmark which shows the issue we will 
obviously take a look at it.

On 29/10/2024 15:39, Brian S O'Neill wrote:
>
> It looks like (again) the HotSpot inliner isn't doing enough to 
> transform the code into the plain internal unsafe code. At the very 
> least, I think there should be a convenience API which doesn't require 
> me to apply a special transform step. The implementation could then at 
> least employ the magic "force inline" annotations to ensure that 
> there's no performance regressions.

Well, if it was that easy :-)

99.99% of the work associated with such issues is to understand _where_ 
adding ForceInline might be beneficial. Just blanket-adding that 
everywhere will likely make your code slower, not faster.


In general 99% of the cost associated with bound checks can be disabled 
by using segments whose length is Long.MAX_VALUE. But, as we have 
learned when looking at some of your examples, this doesn't fully 
eliminate all the costs because there's still a sign check involved: 
e.g. the offset into the memory segment must be > 0 - and there's not 
much C2 can do to eliminate that at the moment.

Of course, when a segment is accessed in a loop, none of this matters - 
checks will be hoisted out of the loops, and any added cost will be 
amortized.

But if code accesses a memory segment (or a byte buffer, or...) in a 
"random" fashion, then some of these additional costs might show up (and 
some are, I think, unavoidable).

Popping back up 100 levels: your message seems to imply that a 
requirement for deprecating unsafe is that we should have a replacement 
API which offers 100% of Unsafe performance _and_ it is safe. I think 
this angle is rather unworkable. That said, I don't want to fully close 
the door to investigate whethet there's better "escape hatch" we can 
express within FFM (e.g. using restricted methods) to support corner 
cases where existing optimizations might not work too well. But to do 
that, we need some benchmark to look at (preferrably one that doesn't 
pull in an entire project).

Maurizio

>
>
> On 2024-10-29 03:02 AM, Maurizio Cimadamore wrote:
>>
>> On 27/10/2024 14:19, Johannes Lichtenberger wrote:
>>> Hello,
>>>
>>> I've watched the Devoxx talk[1] from Thomas and Roy about the 1 
>>> billion rows challenge. I assume Unsafe is going to be deprecated 
>>> and removed at some point, but it seems for max performance it's 
>>> still a good fit (of course if you know what to do). Wouldn't it be 
>>> also possible to remove bounds checks from MemorySegments if you 
>>> really want for instance via some JVM flag or something? Of course I 
>>> know the spacial and temporal bounds checks are a feature and very 
>>> nice (in most cases) :-)
>> Hi,
>>
>> the process for Deprecating Unsafe has already started:
>>
>> https://openjdk.org/jeps/471
>>
>> Re. 1brc, that was a fun challenge to watch. My general sense is that 
>> results are a bit skewed towards the "extreme" - e.g. I don't think 
>> many Java developers would really like to read (or write!) code that 
>> looks like those in the top 10. For instance, in most instances I've 
>> seen, the top 10 examples use some way to memory map a file, but they 
>> never do the unmap (this is done by using a global arena) which I'm 
>> not sure will be considered "good practice". If you write code like 
>> that, and peak performance is your only concern, then yes, you can 
>> get into a place where bound checks matter (a lot) in terms of CPU 
>> cycles. But I'm not sure how well that translates to us mortals :-)
>>
>> That said, note that there are at least a couple of Unsafe-less 
>> solutions that get pretty close to the top spot:
>>
>> https://github.com/gunnarmorling/1brc/blob/main/src/main/java/dev/ 
>> morling/onebrc/CalculateAverage_gonix.java
>>
>> https://github.com/gunnarmorling/1brc/blob/main/src/main/java/dev/ 
>> morling/onebrc/CalculateAverage_merykitty.java
>>
>> One is based on ByteBuffer, the other on MemorySegment. They do their 
>> job in 3s - the top spot does it in 1.5. So... not too bad!
>>
>> Maurizio
>>
>>
>


More information about the panama-dev mailing list