Official support for Unsafe proposal

Tue Jan 16 02:45:56 UTC 2024

Hi,

I understand your concern but I still don't think it is evident. To say
that having an official support for Unsafe would give developers the
impression that "this is the way code should be written" seems like a
stretch to me. We have always had `sun.misc.Unsafe` and developers will
seek its support if they need to regardless. The proposal suggests having a
flag to disable bound checks, which would be a compromise between Unsafe
which is really powerful and can be used without noticing the program and
normal accesses. Having coded professionally in Go as well as participated
in and observed the development of the Go programming language for the last
2 years, I have never personally seen anyone writing
`*((*int)(unsafe.Pointer(uintptr(unsafe.Pointer(unsafe.SliceData(arr))) +
uintptr(i) * 8)))` instead of `arr[i]` and in general people actively avoid
using the `unsafe` package. Furthermore, as you have noted, some of the
techniques we have discovered can only be used on off-heap memory.

Of course this is only my personal opinion and I have little experience in
language design so feel free to neglect it. I am just concerned that we are
missing a crucial feature due to our excessive carefulness.

Thanks a lot,
Quan Anh

On Tue, 16 Jan 2024 at 05:52, Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:

> Hi Quân,
> as touched in the amber-dev mailing list, while I don't doubt that
> adding something like unsafe blocks, or another Unsafe-like API would
> allow to drop bound checks in places where "the developer knows better
> than the JIT", I'm a little skeptical of going down that path. First, I
> think that what you describe is mostly an issue with randomly accessed
> segments (as otherwise usual bound check elimination kicks in,
> amortizing the costs associated with bound checks). Secondly, with some
> of the techniques outined (such as the use of an everything segment), it
> seems that, at least as far as off-heap access goes, we can get quite
> close to Unsafe (as your experiment demonstrates) - and, as that
> solution relies on MemorySegment::reinterpret, running code relying on
> it would require the --enable-native-access flag, which would be a sign
> that the program is attempting to get rid of some of the usual safety
> guarantees. Third, once you have a "blessed" way to get to unchecked
> territory (esp. if that's a keyword, or an API in java.lang) it's easier
> for developers to get the message that "this is the way code should be
> written" - in reality performance is only _one_ of the many
> considerations developers should keep in mind while coding (albeit a
> very important one in a case like the 1BRC :-) ) - as a software
> engineer working with other engineers and having to maintain a code base
> that has to last _decades_ I'd be very concerned about the
> maintainabilty of the solution as well (and the 1BRC shows pretty
> clearly that there's a negative correlation between performance and
> readability of the generated code).
>
> So, stepping back, I think we have to tread carefully: compromising the
> integrity of the Java platform (_for everyone_) to get some extra
> performance juice (_for the few who notice_) just doesn't seem like a
> very good deal.
>
> We will, of course, continue to look into ways to make memory segment
> access (and other form of access) perform better and better (as happens
> in every other part of the Java runtime). It might be possible we'll hit
> some roadblock - in which case we might look at other solutions (perhaps
> a VarHandle-based unchecked access factory), but something like
> "java.lang.Unsafe" would be way too prominent and accessible for its own
> good.
>
> Maurizio
>
> On 15/01/2024 19:48, Quân Anh Mai wrote:
> > Hi Panama folks,
> >
> > This proposal was first announced by me on amber-dev, which is later
> > suggested to move to this mailing list. I will make a summary of the
> > proposal here.
> >
> > Java has made much progress in providing powerful alternatives to the
> > usage of sun.misc.Unsafe that still ensures safe behaviours. However,
> > there is one important use case of Unsafe that cannot be substituted
> > with other safe alternatives, that is the ability to access memory in
> > an unsafe manner.
> >
> > For the vast majority of the cases, bound checks can either be
> > eliminated by the compiler, or be negligibly cheap. However, there are
> > always exceptions.
> >
> > - The compiler can theoretically eliminate a lot of bound checks, but
> > there are cases where it cannot do anything. An example is if a
> > function inside a hot loop is not inlined, from the perspective of the
> > function, the checks only happen once each, and there is no place to
> > hoist it to, but from the perspective of the program, this can mean
> > numerous bound checks executed inside its hot loop. Another example is
> > if the access index cannot be reasoned about from the surrounding
> > context, the compiler cannot do anything here and must perform a bound
> > check.
> > - A bound check may be not cheap, it often consists of a memory load,
> > a compare and jump, and an arithmetic instruction if the types of the
> > container and the access do not match. Although it may not have any
> > noticeable effect if the program is latency-bound, it can result in
> > massive regression if the program bottleneck is in the decoder or the
> > execution ports. The issue is not only that the effect may be large,
> > but also that it is unpredictable.
> >
> > As a single data point, for my 1brc submission, using the same
> > approach, the only difference is how the accesses are done:
> >
> > - Using Unsafe [1]:
> > Instruction count: 1.1e11 (1e9 lines)
> > Compiled code run time: 7.422 ± 0.093 ms (1e6 lines)
> > - Using the "everything" segment trick [2]:
> > Instruction count: 1.4e11 (1e9 lines)
> > Compiled code run time: 7.686 ± 0.181 ms (1e6 lines)
> > - Using safe accesses [3]:
> > Instruction count: 2e11 (1e9 lines)
> > Compiled code run time: 9.009 ± 0.058 ms (1e6 lines)
> >
> > Looking at other languages, C++ is unchecked by default, C#, Go, and
> > even Rust all provide the programmers the ability to access memory in
> > an unsafe manner if the need arises. This shows that the necessity of
> > unsafe accesses is evident.
> >
> > My proposal is to introduce a class java.lang.Unsafe that provides
> > utility methods such as `static int arrayLoadUnchecked(int[], int)`.
> > This method will attempt to load an element of an array at the
> > specified index assuming that the array is not null and the index is
> > not out of bounds. Normally, if one of these restrictions is violated,
> > the method will throw an AssertionError. However, if when starting the
> > program, a flag --enable-unsafe-access=<module-name> is provided, then
> > the compiler is allowed to elide the checks, which makes the access
> > truly unchecked.
> >
> > This is different from --enable-native-access due to the fact that
> > functionally, a valid unchecked and a valid checked access is
> > equivalent, which makes it possible to replace an unchecked access
> > with a checked access without compromising the functionality of the
> > program. This approach has some benefits. Firstly, it allows the
> > libraries to not force usage of --enable-unsafe-flag on its users.
> > Secondly, a library can be used as a performance-critical component in
> > some programs, but not in the others, this solution allows only the
> > needed program to utilise the unchecked access capability of the
> > library. From the perspective of a program, it is able to minimise
> > risk as modules not in critical sections will still perform bound
> > checks as normal.
> >
> > This proposal is not without concerns. The first one is the unsafety
> > of the feature itself, as an unchecked access can potentially crash
> > the program, silently corrupt the progress memory, or worse, result in
> > program miscompilation. This is unavoidable given the unsafe nature of
> > the proposal, however, the risk is minimised since this feature would
> > be only used in very limited circumstances, and even then, the risk is
> > present only in a limited range of applications. The second concern is
> > regarding culture, that is the concern that developers may recklessly
> > and carelessly use unsafe in their code. I think this is a valid but
> > not evident concern. As it is much more readable and easier to write
> > `arr[i]` than to write `Unsafe.arrayLoadUnchecked(arr, i)`, there is
> > little chance that developers will recklessly use unchecked accesses,
> > especially given the existing culture of using checked accesses of
> > Java. Evidently, other languages that provide unsafe capabilities as
> > non-default (C#, Go, and Rust) seem to not have issues with developers
> > recklessly utilise them even after decades of history. The third
> > concern is the burden of maintenance. I have thought about it and made
> > a very minimal prototype of the feature [4], my idea is that the
> > accesses can be implemented purely in Java, and C2 will intercept and
> > remove the checks. This will mostly be delegated to other routines
> > already existing in C2, which minimises the overhead of maintenance.
> >
> > I expect this feature will only be used in performance-sensitive
> > situations when every bound check counts as they can accumulate really
> > fast such as in a json parsing library. This brings cascading effects
> > as other libraries and programs can benefit from the improved
> > performance if and only if the need arises.
> >
> > This is my summary and rewrite of the proposal, please let me know if
> > you have any ideas or concerns. Thanks a lot,
> > Quan Anh
> >
> > [1]: https://github.com/merykitty/1brc/tree/main
> > [2]: https://github.com/merykitty/1brc/tree/removeunsafe
> > [3]: https://github.com/merykitty/1brc/tree/varhandles
> > [4]:
> >
> https://github.com/openjdk/jdk/compare/master...merykitty:unsafe?expand=1
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240116/d85a156f/attachment.htm>