Null-terminated Unicode strings in java.io on Windows
Robert Lougher
rob.lougher at gmail.com
Fri Jan 25 16:45:51 UTC 2008
Hi,
Krzysztof Żelechowski-2 wrote:
>
>
> Dnia 25-01-2008, Pt o godzinie 13:14 +0000, Mark Wielaard pisze:
>> Krzysztof Żelechowski <program.spe at ...> writes:
>> > If the specification gets fixed so that GSC result MUST be z-term,
>> > your VM will cease being conformant
>> > so it will be fixed and no additional buffers will be needed.
>>
>> Eh, that doesn't seem right at all.
>> The specification currently doesn't guarantee that the result is a jchar
>> array
>> that is zero terminated. So you can expect current runtimes not to do
>> this. As
>> Roman said at least JamaicaVM doesn't do this. I just checked the
>> implementations gcj and jamvm, they both also don't make any such
>> guarantee
>> (cacao does seem to add an extra 0 at the end of the result it returns
>> though).
>> So "clarifying the spec" would break a lot of code of currently
>> conforming
>> implementations. The code relying on this behavior seems to be just buggy
>> and
>> should be fixed imho.
>
> The specification is buggy
> in that it does not take into account the operating system interface
> and makes correct memory management inefficient
> for the benefit of sparing one byte per buffer
> where an OS call is not needed.
> Ridiculous.
> The developers at Sun
> found the correct way to interpreting the specification;
> the other ones followed it blindfolded. It is now time to repent.
>
Wrong! Requiring null termimation will make things more inefficient. This
is because Strings within Java are not null-terminated. So to add the null
the VM will have to copy the String chars into a new buffer.
A more efficient approach is to simply return a pointer to the String chars
themselves. However, this will not be null-terminated.
The JNI specification allows a VM to either copy the chars or return a
direct pointer. The extra isCopy parameter can be used to find out what it
did.
The point is, if the programmer doesn't need a null-terminated string, not
copying is _much_ more efficient. The programmer can always copy and add
the null if they need to. But forcing the VM to null-terminate will require
a copy and slow it down it all cases.
If I was updating the spec, I would change it so that if a copy is returned
it is always null terminated. If it isn't a copy then it may or may not be.
It's likely no VMs will need changing, as I suspect the ones that do not
null-terminate are returning direct pointers (e.g. JamVM).
And I doubt Sun makes a copy because of the null. Giving out direct heap
pointers causes problems for VMs that move objects within the heap (e.g. a
compacting GC). Either you've got to "pin" the object so it can't move or
you always copy. Sun probably chose the latter. In JamVM, I decided to pin
the String (it's unpinned in ReleaseStringChars).
Rob.
P.S. I hope your blindfold has been removed :) When implementing a VM few
things are as straight-forward as they may seem.
--
View this message in context: http://www.nabble.com/Null-terminated-Unicode-strings-in-java.io-on-Windows-tp15006673p15091812.html
Sent from the OpenJDK Core Libraries mailing list archive at Nabble.com.
More information about the core-libs-dev
mailing list