Null-terminated Unicode strings in java.io on Windows
Krzysztof Żelechowski
program.spe at home.pl
Fri Jan 25 17:23:14 UTC 2008
Dnia 25-01-2008, Pt o godzinie 17:08 +0000, Robert Lougher pisze:
> Hi,
Hi-aye.
>
> Apologies if you receive this twice. I sent it via nabble and it's
> now stuck awaiting moderation so I've subscribed.
>
> <quote author="Krzysztof Żelechowski-2">
>
> Dnia 25-01-2008, Pt o godzinie 13:14 +0000, Mark Wielaard pisze:
> > Krzysztof Żelechowski <program.spe at ...> writes:
> > > If the specification gets fixed so that GSC result MUST be z-term,
> > > your VM will cease being conformant
> > > so it will be fixed and no additional buffers will be needed.
> >
> > Eh, that doesn't seem right at all.
> > The specification currently doesn't guarantee that the result is a jchar array
> > that is zero terminated. So you can expect current runtimes not to do this. As
> > Roman said at least JamaicaVM doesn't do this. I just checked the
> > implementations gcj and jamvm, they both also don't make any such guarantee
> > (cacao does seem to add an extra 0 at the end of the result it returns though).
> > So "clarifying the spec" would break a lot of code of currently conforming
> > implementations. The code relying on this behavior seems to be just buggy and
> > should be fixed imho.
>
> The specification is buggy
> in that it does not take into account the operating system interface
> and makes correct memory management inefficient
> for the benefit of sparing one byte per buffer
> where an OS call is not needed.
> Ridiculous.
> The developers at Sun
> found the correct way to interpreting the specification;
> the other ones followed it blindfolded. It is now time to repent.
> </quote>
>
> Wrong! Requiring null termimation will make things more inefficient.
> This is because Strings within Java are not null-terminated.
They are not z-term in the sense that they may contain zero inside,
but nothing more.
The implementation is free
to affix zero to each and every string buffer
and make that zero unavailable to Java as required by the specification.
It is an easy thing to do because strings are immutable.
> So to
> add the null the VM will have to copy the String chars into a new
> buffer.
>
> A more efficient approach is to simply return a pointer to the String
> chars themselves. However, this will not be null-terminated.
It depends on the implementation, as described above.
>
> The JNI specification allows a VM to either copy the chars or return a
> direct pointer. The extra isCopy parameter can be used to find out
> what it did.
>
> The point is, if the programmer doesn't need a null-terminated string,
> not copying is _much_ more efficient. The programmer can always copy
> and add the null if they need to. But forcing the VM to
> null-terminate will require a copy and slow it down it all cases.
No, it will not,
because all strings buffers will have an inaccessible zero at the end.
>
> If I was updating the spec, I would change it so that if a copy is
> returned it is always null terminated. If it isn't a copy then it may
> or may not be. It's likely no VMs will need changing, as I suspect
> the ones that do not null-terminate are returning direct pointers
> (e.g. JamVM).
If I was updating the spec,
I would say that
strings are required to be inaccessibly z-term as above
if the underlying OS expects them to be in most cases.
>
> And I doubt Sun makes a copy because of the null.
So do I, they apparently need not.
>
> Rob.
>
> P.S. I hope your blindfold has been removed :) When implementing a VM
> few things are as straight-forward as they may seem.
So do I (that my blindfold has been removed).
Chris
More information about the core-libs-dev
mailing list