Official support for Unsafe proposal

Tue Jan 16 04:35:54 UTC 2024

On 16/01/2024 12:45 pm, Quân Anh Mai wrote:
> Hi,
> 
> I understand your concern but I still don't think it is evident. To say 
> that having an official support for Unsafe would give developers the 
> impression that "this is the way code should be written" seems like a 
> stretch to me. We have always had `sun.misc.Unsafe` 

And that was never supposed to be used by developers, and they did so at 
their own risk. Putting something public in java.lang is the strongest 
endorsement of an API and at says this API is intended for general use. 
I can't speak for the project lead but I can't see us ever creating an 
Unsafe variant as you describe - sorry. If you need specific 
functionality within an existing API (like Panama ones) then a case can 
be made for it.

Cheers,
David
-----

> and developers will 
> seek its support if they need to regardless. The proposal suggests 
> having a flag to disable bound checks, which would be a compromise 
> between Unsafe which is really powerful and can be used without noticing 
> the program and normal accesses. Having coded professionally in Go as 
> well as participated in and observed the development of the Go 
> programming language for the last 2 years, I have never personally seen 
> anyone writing 
> `*((*int)(unsafe.Pointer(uintptr(unsafe.Pointer(unsafe.SliceData(arr))) 
> + uintptr(i) * 8)))` instead of `arr[i]` and in general people actively 
> avoid using the `unsafe` package. Furthermore, as you have noted, some 
> of the techniques we have discovered can only be used on off-heap memory.
> 
> Of course this is only my personal opinion and I have little experience 
> in language design so feel free to neglect it. I am just concerned that 
> we are missing a crucial feature due to our excessive carefulness.
> 
> Thanks a lot,
> Quan Anh
> 
> On Tue, 16 Jan 2024 at 05:52, Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com <mailto:maurizio.cimadamore at oracle.com>> 
> wrote:
> 
>     Hi Quân,
>     as touched in the amber-dev mailing list, while I don't doubt that
>     adding something like unsafe blocks, or another Unsafe-like API would
>     allow to drop bound checks in places where "the developer knows better
>     than the JIT", I'm a little skeptical of going down that path. First, I
>     think that what you describe is mostly an issue with randomly accessed
>     segments (as otherwise usual bound check elimination kicks in,
>     amortizing the costs associated with bound checks). Secondly, with some
>     of the techniques outined (such as the use of an everything
>     segment), it
>     seems that, at least as far as off-heap access goes, we can get quite
>     close to Unsafe (as your experiment demonstrates) - and, as that
>     solution relies on MemorySegment::reinterpret, running code relying on
>     it would require the --enable-native-access flag, which would be a sign
>     that the program is attempting to get rid of some of the usual safety
>     guarantees. Third, once you have a "blessed" way to get to unchecked
>     territory (esp. if that's a keyword, or an API in java.lang) it's
>     easier
>     for developers to get the message that "this is the way code should be
>     written" - in reality performance is only _one_ of the many
>     considerations developers should keep in mind while coding (albeit a
>     very important one in a case like the 1BRC :-) ) - as a software
>     engineer working with other engineers and having to maintain a code
>     base
>     that has to last _decades_ I'd be very concerned about the
>     maintainabilty of the solution as well (and the 1BRC shows pretty
>     clearly that there's a negative correlation between performance and
>     readability of the generated code).
> 
>     So, stepping back, I think we have to tread carefully: compromising the
>     integrity of the Java platform (_for everyone_) to get some extra
>     performance juice (_for the few who notice_) just doesn't seem like a
>     very good deal.
> 
>     We will, of course, continue to look into ways to make memory segment
>     access (and other form of access) perform better and better (as happens
>     in every other part of the Java runtime). It might be possible we'll
>     hit
>     some roadblock - in which case we might look at other solutions
>     (perhaps
>     a VarHandle-based unchecked access factory), but something like
>     "java.lang.Unsafe" would be way too prominent and accessible for its
>     own
>     good.
> 
>     Maurizio
> 
>     On 15/01/2024 19:48, Quân Anh Mai wrote:
>      > Hi Panama folks,
>      >
>      > This proposal was first announced by me on amber-dev, which is later
>      > suggested to move to this mailing list. I will make a summary of the
>      > proposal here.
>      >
>      > Java has made much progress in providing powerful alternatives to
>     the
>      > usage of sun.misc.Unsafe that still ensures safe behaviours.
>     However,
>      > there is one important use case of Unsafe that cannot be substituted
>      > with other safe alternatives, that is the ability to access
>     memory in
>      > an unsafe manner.
>      >
>      > For the vast majority of the cases, bound checks can either be
>      > eliminated by the compiler, or be negligibly cheap. However,
>     there are
>      > always exceptions.
>      >
>      > - The compiler can theoretically eliminate a lot of bound checks,
>     but
>      > there are cases where it cannot do anything. An example is if a
>      > function inside a hot loop is not inlined, from the perspective
>     of the
>      > function, the checks only happen once each, and there is no place to
>      > hoist it to, but from the perspective of the program, this can mean
>      > numerous bound checks executed inside its hot loop. Another
>     example is
>      > if the access index cannot be reasoned about from the surrounding
>      > context, the compiler cannot do anything here and must perform a
>     bound
>      > check.
>      > - A bound check may be not cheap, it often consists of a memory
>     load,
>      > a compare and jump, and an arithmetic instruction if the types of
>     the
>      > container and the access do not match. Although it may not have any
>      > noticeable effect if the program is latency-bound, it can result in
>      > massive regression if the program bottleneck is in the decoder or
>     the
>      > execution ports. The issue is not only that the effect may be large,
>      > but also that it is unpredictable.
>      >
>      > As a single data point, for my 1brc submission, using the same
>      > approach, the only difference is how the accesses are done:
>      >
>      > - Using Unsafe [1]:
>      > Instruction count: 1.1e11 (1e9 lines)
>      > Compiled code run time: 7.422 ± 0.093 ms (1e6 lines)
>      > - Using the "everything" segment trick [2]:
>      > Instruction count: 1.4e11 (1e9 lines)
>      > Compiled code run time: 7.686 ± 0.181 ms (1e6 lines)
>      > - Using safe accesses [3]:
>      > Instruction count: 2e11 (1e9 lines)
>      > Compiled code run time: 9.009 ± 0.058 ms (1e6 lines)
>      >
>      > Looking at other languages, C++ is unchecked by default, C#, Go, and
>      > even Rust all provide the programmers the ability to access
>     memory in
>      > an unsafe manner if the need arises. This shows that the
>     necessity of
>      > unsafe accesses is evident.
>      >
>      > My proposal is to introduce a class java.lang.Unsafe that provides
>      > utility methods such as `static int arrayLoadUnchecked(int[], int)`.
>      > This method will attempt to load an element of an array at the
>      > specified index assuming that the array is not null and the index is
>      > not out of bounds. Normally, if one of these restrictions is
>     violated,
>      > the method will throw an AssertionError. However, if when
>     starting the
>      > program, a flag --enable-unsafe-access=<module-name> is provided,
>     then
>      > the compiler is allowed to elide the checks, which makes the access
>      > truly unchecked.
>      >
>      > This is different from --enable-native-access due to the fact that
>      > functionally, a valid unchecked and a valid checked access is
>      > equivalent, which makes it possible to replace an unchecked access
>      > with a checked access without compromising the functionality of the
>      > program. This approach has some benefits. Firstly, it allows the
>      > libraries to not force usage of --enable-unsafe-flag on its users.
>      > Secondly, a library can be used as a performance-critical
>     component in
>      > some programs, but not in the others, this solution allows only the
>      > needed program to utilise the unchecked access capability of the
>      > library. From the perspective of a program, it is able to minimise
>      > risk as modules not in critical sections will still perform bound
>      > checks as normal.
>      >
>      > This proposal is not without concerns. The first one is the unsafety
>      > of the feature itself, as an unchecked access can potentially crash
>      > the program, silently corrupt the progress memory, or worse,
>     result in
>      > program miscompilation. This is unavoidable given the unsafe
>     nature of
>      > the proposal, however, the risk is minimised since this feature
>     would
>      > be only used in very limited circumstances, and even then, the
>     risk is
>      > present only in a limited range of applications. The second
>     concern is
>      > regarding culture, that is the concern that developers may
>     recklessly
>      > and carelessly use unsafe in their code. I think this is a valid but
>      > not evident concern. As it is much more readable and easier to write
>      > `arr[i]` than to write `Unsafe.arrayLoadUnchecked(arr, i)`, there is
>      > little chance that developers will recklessly use unchecked
>     accesses,
>      > especially given the existing culture of using checked accesses of
>      > Java. Evidently, other languages that provide unsafe capabilities as
>      > non-default (C#, Go, and Rust) seem to not have issues with
>     developers
>      > recklessly utilise them even after decades of history. The third
>      > concern is the burden of maintenance. I have thought about it and
>     made
>      > a very minimal prototype of the feature [4], my idea is that the
>      > accesses can be implemented purely in Java, and C2 will intercept
>     and
>      > remove the checks. This will mostly be delegated to other routines
>      > already existing in C2, which minimises the overhead of maintenance.
>      >
>      > I expect this feature will only be used in performance-sensitive
>      > situations when every bound check counts as they can accumulate
>     really
>      > fast such as in a json parsing library. This brings cascading
>     effects
>      > as other libraries and programs can benefit from the improved
>      > performance if and only if the need arises.
>      >
>      > This is my summary and rewrite of the proposal, please let me
>     know if
>      > you have any ideas or concerns. Thanks a lot,
>      > Quan Anh
>      >
>      > [1]: https://github.com/merykitty/1brc/tree/main
>     <https://github.com/merykitty/1brc/tree/main>
>      > [2]: https://github.com/merykitty/1brc/tree/removeunsafe
>     <https://github.com/merykitty/1brc/tree/removeunsafe>
>      > [3]: https://github.com/merykitty/1brc/tree/varhandles
>     <https://github.com/merykitty/1brc/tree/varhandles>
>      > [4]:
>      >
>     https://github.com/openjdk/jdk/compare/master...merykitty:unsafe?expand=1 <https://github.com/openjdk/jdk/compare/master...merykitty:unsafe?expand=1>
>