Null-terminated Unicode strings in java.io on Windows

Robert Lougher rob.lougher at gmail.com
Fri Jan 25 17:08:39 UTC 2008


Hi,

Apologies if you receive this twice.  I sent it via nabble and it's
now stuck awaiting moderation so I've subscribed.

<quote author="Krzysztof Żelechowski-2">

Dnia 25-01-2008, Pt o godzinie 13:14 +0000, Mark Wielaard pisze:
> Krzysztof Żelechowski <program.spe at ...> writes:
> > If the specification gets fixed so that GSC result MUST be z-term,
> > your VM will cease being conformant
> > so it will be fixed and no additional buffers will be needed.
>
> Eh, that doesn't seem right at all.
> The specification currently doesn't guarantee that the result is a jchar array
> that is zero terminated. So you can expect current runtimes not to do this. As
> Roman said at least JamaicaVM doesn't do this. I just checked the
> implementations gcj and jamvm, they both also don't make any such guarantee
> (cacao does seem to add an extra 0 at the end of the result it returns though).
> So "clarifying the spec" would break a lot of code of currently conforming
> implementations. The code relying on this behavior seems to be just buggy and
> should be fixed imho.

The specification is buggy
in that it does not take into account the operating system interface
and makes correct memory management inefficient
for the benefit of sparing one byte per buffer
where an OS call is not needed.
Ridiculous.
The developers at Sun
found the correct way to interpreting the specification;
the other ones followed it blindfolded.  It is now time to repent.
</quote>

Wrong!  Requiring null termimation will make things more inefficient.
This is because Strings within Java are not null-terminated.  So to
add the null the VM will have to copy the String chars into a new
buffer.

A more efficient approach is to simply return a pointer to the String
chars themselves.  However, this will not be null-terminated.

The JNI specification allows a VM to either copy the chars or return a
direct pointer.  The extra isCopy parameter can be used to find out
what it did.

The point is, if the programmer doesn't need a null-terminated string,
not copying is _much_ more efficient.  The programmer can always copy
and add the null if they need to.  But forcing the VM to
null-terminate will require a copy and slow it down it all cases.

If I was updating the spec, I would change it so that if a copy is
returned it is always null terminated.  If it isn't a copy then it may
or may not be.  It's likely no VMs will need changing, as I suspect
the ones that do not null-terminate are returning direct pointers
(e.g. JamVM).

And I doubt Sun makes a copy because of the null.  Giving out direct
heap pointers causes problems for VMs that move objects within the
heap (e.g. a compacting GC).  Either you've got to "pin" the object so
it can't move or you always copy.  Sun probably chose the latter.  In
JamVM, I decided to pin the String (it's unpinned in
ReleaseStringChars).

Rob.

P.S.  I hope your blindfold has been removed :) When implementing a VM
few things are as straight-forward as they may seem.


More information about the core-libs-dev mailing list