The store for byte strings

Florian Weimer fw at deneb.enyo.de
Sun Jun 10 09:47:11 UTC 2018


* John Rose:

> In https://bugs.openjdk.java.net/browse/JDK-8161256 I discuss
> this nascent API under the name "ByteSequence", which is analogous
> to CharSequence, but doesn't mention the types 'char' or 'String'.

Very interesting.

What's the specification for toString() and hashCode()?

One problem of retrofitting a custom ByteString into a CharSequence is
that CharSequence reuses toString() in a fairly central fashion, and
it's hard to reconcile this with byte-based length() and charAt()
methods unless ISO-8859-1 encoding is used.

If this feature is supposed to land before JEP 218?  If not, how does
ByteSequence differ from List<byte>?

> If the ByteSequence views are value instances, they can be created
> at a very high rate with little or no GC impact.  Generic algorithms
> would still operate on them 

I'm not up-to-date with those upcoming changes.  Would the nature as
value instances be expressed as part of the ByteSequence interface
type?

> If the API is properly defined it can be inserted directly into
> existing types like ByteBuffer.  Doing this will probably require us
> to polish ByteBuffer a little, adding immutability as an option and
> lifting the 32-bit limits.  It should be possible to "freeze" a
> ByteBuffer or array and use it as a backing store that is reliably
> immutable, so it can be handed to zero-copy algorithms that work
> with ByteSequences.

Such freezing is incompatible with mapped byte buffers, right?  Even
if the implementation prevents changes at the VM/process level,
changes on the file system could well become visible.  Do you expect
to make freezing an optional operation (probably not a good idea), or
copy the contents of the mapping to the heap (which is probably not
too bad, considering that a shared byte[] could also result in
arbitrarily large copies).

> Independently, I want to eventually add frozen arrays, including
> frozen byte[] arrays, to the JVM, but that doesn't cover zero-copy use
> cases; it has to be an interface like CharSequence.

Well, there is already the VarHandle approach for that.  But it's not
a particularly rich interface and very far away from strings.

> So the option I prefer is not on your list; it would be:
>
> (h) ByteSequence interface with retrofits to ByteBuffer, byte[], etc.
>
> This is more flexible than (f) the concrete ByteString class.  I think
> the ByteString you are thinking of would appear as a non-public class
> created by a ByteSequence factory, analogous to List::of.

Yes, this seems reasonable.  It's a bit of a drawback that the
immutable nature of a value cannot be expressed in the type system (so
that you have to remember to use ByteSequence::of to get a view to an
immutable object), but at least it's consistent with collections.


More information about the core-libs-dev mailing list