Unsigned Integers (was: Support for 64-bit pointers)

Ty Young youngty1997 at gmail.com
Fri Jan 1 22:24:10 UTC 2021


Sprinkling about a dozen methods onto the various number classes seems 
like a unelegant, kicking the  (metaphorical) can down the road approach 
IMO when adding actual unsigned types "Just Works(TM)". I think the 
MemoryAccess class/current jextract is a good example of why kicking the 
(metaphorical) can down the road instead of fixing the actual issue(that 
Panama needs a higher level API) is a bad idea.


I'm not sure how having unsigned types would affect existing code, 
unless this is somekind of low-level technical issue? ubyte, ushort, 
uint, ulong are all different from their signed variants. If code that 
used byte[] worked before it should continue to work in the future, I 
would think.


I don't remember if anyone has said Valhalla is an absolute requirement 
for Panama to be shipped, I only remember it being said that it can 
benefit in regards to reducing the memory cost of MemoryLayout types. If 
current Panama is wired so that an unsigned byte is actually a short(as 
they are with VarHandles, IIRC) could Panama even change it in the 
future without breaking backwards compatibility, assuming Panama is 
released before Valhalla? The same could be asked about a high level API 
too I think: if the lower level parts of Panama are shipped first and a 
higher level API is worked on after the fact then what happens if some 
major API issue is found in the lower level 
parts(MemoryAddress/MemorySegment)?


I thought for the longest time that the Java only memory access parts of 
Panama was going to be shipped first, then ABI, jextract, maybe some 
high level API if it's deemed necessary/possible, and finally non-C APIs 
but the java memory access parts and the ABI have been mixed in a single 
repository and the old ones abandoned.


On 1/1/21 2:23 PM, Johannes Kuhn wrote:
> To chime in - there is already limited support for unsigned int / 
> unsigned long.
>
> The short version is: You have to explicitly call 
> Long.toUnsignedString or Long.divideUnsigned. Same for int.
>
> So far, so good. But there are some missing things:
> * Math.addExactUnsigned - should throw if there is an **unsigned** 
> overflow.
> * Double.fromUnsignedLong - Turning an unsigned long into a double.
> * Byte.toUnsignedString
> * Double.toUnsignedLong - Turning a double into an unsigned long.
>
> I agree that using unsigned currently is clunky.
> There are some ways to work with that, sure, but the picture is not 
> yet complete.
>
> The big problem with separate unsigned types is backwards 
> comparability:  if you have an array of unsigned bytes, but the api 
> requires byte[], then you better can convert between those two 
> representations without copying. So, IMHO, ask the amber guys for that.
>
> - Johannes
>
> PS.: I already intended to address those things in an other feedback 
> mail, but this thread came first, so...
>
> On 01-Jan-21 19:56, Ty Young wrote:
>> Agreed, Java really needs unsigned types. Besides this specific 
>> issue, there is also a major issue representing unsigned types in a 
>> high-level API such as the previous Pointer API or my current API. In 
>> order to represent a 1-byte unsigned number(byte) you would have to 
>> use a 2-byte signed number(short), which distorts APIs and creates 
>> confusion as an API user. If a function header says that you need an 
>> unsigned long then Java should be able to pass an actual unsigned 
>> long type to it.
>>
>>
>> On 1/1/21 12:42 PM, leerho wrote:
>>> Hi Radosław & Florian,
>>>
>>> On a related topic, I have wished for years that Java would finally 
>>> support
>>> unsigned integral types (byte, short, int, and long).  I'm sure that 
>>> I'm
>>> not the first to mention this, so forgive me if this has been hashed 
>>> out
>>> before.  But I do not understand why there would be resistance to this.
>>> For example, packing and unpacking "C-struct" -like data structures 
>>> is a
>>> PITA in Java,  Bytes and shorts have to be masked when upcasting to 
>>> ints
>>> and ints have to be masked when upcasting to longs all because of the
>>> automatic sign-extension.   Not having unsigned types creates a ripe 
>>> area
>>> for bugs that can be hard to find.
>>>
>>> Is there any hope of Java finally getting full support of unsigned 
>>> types?
>>>
>>> Cheers, and Happy New Year,
>>>
>>> Lee.
>>>
>>>
>>>
>>> On Fri, Jan 1, 2021 at 9:49 AM Radosław Smogura <mail at smogura.eu> 
>>> wrote:
>>>
>>>> Hi Florian,
>>>>
>>>> That’s correct, even more there’s only 48 bits addressable.
>>>>
>>>> I’m not sure what I was thinking - I was doing some performance 
>>>> checks and
>>>> was concerned that unrestricted access looked like doing range checks
>>>> and... well found this unrelated topic.
>>>>
>>>> However just to mention, some Linux distribution uses vsyscalls 
>>>> (which is
>>>> going to be deprecated) and this is mapped to the tail of memory 
>>>> ([1] nice
>>>> asm code)
>>>>
>>>> So even it’s not addressable it’s usable. I’m not sure if it’s worth
>>>> handling.
>>>>
>>>> Kind regards,
>>>> Rado
>>>>
>>>> [1]
>>>> https://stackoverflow.com/questions/7266813/how-does-the-gettimeofday-syscall-work 
>>>>
>>>>
>>>> On Jan 1, 2021, at 2:04 PM, Florian Weimer <fw at deneb.enyo.de> wrote:
>>>>
>>>> * Radosław Smogura:
>>>>
>>>> In current version of implementation (many places), there’s a lot of
>>>> range checks. However longs in Java are signed, and C pointers are
>>>> unsigned so, at least for x86-64 architectures this should be taken
>>>> on account, otherwise we would not be able to address whole memory
>>>> in straightforward way (largest block size is 2^31, can directly
>>>> address upper half of memory).
>>>>
>>>> The x86-64 architecture actually has signed addresses in the sense
>>>> that some number of upper bits of pointers must match the sign bit.
>>>> On Linux, userspace addresses always have a zero sign bit. I think
>>>> this is not true on Solaris.  I don't know about Windows.
>>>>
>>>> On some architectures, C struggles with similar issues because
>>>> ptrdiff_t is signed and does not cover the entire address space.  It's
>>>> therefore undefined to create objects whose size is greater than what
>>>> can be expressed as a ptrdiff_t value, despite the underlying
>>>> architecture supporting this.
>>>>


More information about the panama-dev mailing list