Null-terminated Unicode strings in java.io on Windows
Robert Lougher
rob.lougher at gmail.com
Fri Jan 25 17:30:07 UTC 2008
On 1/25/08, Krzysztof Żelechowski <program.spe at home.pl> wrote:
>
> Dnia 25-01-2008, Pt o godzinie 17:08 +0000, Robert Lougher pisze:
> > Hi,
>
> Hi-aye.
>
> >
> > Apologies if you receive this twice. I sent it via nabble and it's
> > now stuck awaiting moderation so I've subscribed.
> >
> > <quote author="Krzysztof Żelechowski-2">
> >
> > Dnia 25-01-2008, Pt o godzinie 13:14 +0000, Mark Wielaard pisze:
> > > Krzysztof Żelechowski <program.spe at ...> writes:
> > > > If the specification gets fixed so that GSC result MUST be z-term,
> > > > your VM will cease being conformant
> > > > so it will be fixed and no additional buffers will be needed.
> > >
> > > Eh, that doesn't seem right at all.
> > > The specification currently doesn't guarantee that the result is a jchar array
> > > that is zero terminated. So you can expect current runtimes not to do this. As
> > > Roman said at least JamaicaVM doesn't do this. I just checked the
> > > implementations gcj and jamvm, they both also don't make any such guarantee
> > > (cacao does seem to add an extra 0 at the end of the result it returns though).
> > > So "clarifying the spec" would break a lot of code of currently conforming
> > > implementations. The code relying on this behavior seems to be just buggy and
> > > should be fixed imho.
> >
> > The specification is buggy
> > in that it does not take into account the operating system interface
> > and makes correct memory management inefficient
> > for the benefit of sparing one byte per buffer
> > where an OS call is not needed.
> > Ridiculous.
> > The developers at Sun
> > found the correct way to interpreting the specification;
> > the other ones followed it blindfolded. It is now time to repent.
> > </quote>
> >
> > Wrong! Requiring null termimation will make things more inefficient.
> > This is because Strings within Java are not null-terminated.
>
> They are not z-term in the sense that they may contain zero inside,
> but nothing more.
> The implementation is free
> to affix zero to each and every string buffer
> and make that zero unavailable to Java as required by the specification.
> It is an easy thing to do because strings are immutable.
>
> > So to
> > add the null the VM will have to copy the String chars into a new
> > buffer.
> >
> > A more efficient approach is to simply return a pointer to the String
> > chars themselves. However, this will not be null-terminated.
>
> It depends on the implementation, as described above.
No it doesn't. An implementation would have to be truly stupid to
internally null-terminate. How many Strings are in the heap? How
many will the programmer access via GetStringChars? The null will be
a overhead for all Strings for a miniscule percentage.
More information about the core-libs-dev
mailing list