Official support for Unsafe

Fri Jan 12 14:43:03 UTC 2024

Before I comment on your proposal, let's see what we can get through the 
means we have today.

As you have tried to do, you can use MemorySegment, and its accessor. If 
your loops are counted, you should get loop unrolling, bound check 
elimination, and even auto-vectorization.

The problem is when a loop is not counted, or runs for a number of 
iterations that is too small (so that even if checks are hoisted 
outside, you still execute them a lot).

There is an escape hatch in the FFM API, one that requires 
"--enable-native-access". It goes like this:

* Create an _everything_ segment, doing 
`MemorySegment.NULL.reinterpret(Long.MAX_SIZE)`
* Stash the everything segment in a static final constant (so that JIT 
can see through its fields)
* Tweak your code to operate on the everything segment - e.g. this 
becomes effectively a replacement for an Unsafe::getXYZ taking a long 
address

Note: this is still not 100% on par with Unsafe, because FFM has still 
to check that the address you pass is positive. But this a much much 
simpler check.

Hope this helps.

Maurizio

On 12/01/2024 14:14, Quân Anh Mai wrote:
> Hi,
>
> These days I have kept an eye on the 1brc challenge and 1 particular 
> phenomenon that has struck me is the usage of Unsafe by the 
> participants. While Java has developed a lot in terms of alternatives 
> for Unsafe such as VarHandle and FFM, a particular use case of Unsafe 
> is to access memory in an unsafe manner which cannot be done without 
> some kind of unsafe support.
>
> I believe that while the compiler can theoretically eliminate a lot of 
> bound checks and even if they exist, a predictable branch is normally 
> cheap, there will always be places where that is not the case, as the 
> access can be very far from the place where the compiler can get the 
> relevant information, or the bound checks will compete with the 
> bottlenecked CPU frontend or backend. In the cases where every 
> nanosecond counts, a bound check may be prohibitively expensive, and 
> the capability to bypass them would be valuable.
>
> Looking at other languages, C++ is unsafe by nature, C#, Go and even 
> Rust all provide the ability to step out of the safe realm. As a 
> result, I think the necessity of unsafe accesses is evident.
>
> Regarding the implementation, the support can start minimally with the 
> ability to access arrays or to interpret an array as another without 
> bound checking. Unsafe accesses require the program to start with 
> --enable-unsafe-accesses=<module-name>, the same as how we restrict 
> native accesses, which IMO exhibits a similar nature after the 
> introduction of passing heap segment to native downcalls. Last but not 
> least, unsafe accesses will still perform bound checks prior to C2 or 
> in the presence of a special unsafe flag, the former acts as a safety 
> net that may catch most mistakes the programmers make and the latter 
> is a sanity check when the need arises.
>
> Please let me know if there is any misunderstanding regarding the 
> situation, or if this is not the suitable mailing list for the proposal.
>
> Regards,
> Quan Anh