Null-terminated Unicode strings in java.io on Windows

Robert Lougher rob.lougher at gmail.com
Fri Jan 25 16:45:51 UTC 2008


Hi,


Krzysztof Żelechowski-2 wrote:
> 
> 
> Dnia 25-01-2008, Pt o godzinie 13:14 +0000, Mark Wielaard pisze:
>> Krzysztof Żelechowski <program.spe at ...> writes:
>> > If the specification gets fixed so that GSC result MUST be z-term, 
>> > your VM will cease being conformant 
>> > so it will be fixed and no additional buffers will be needed. 
>> 
>> Eh, that doesn't seem right at all.
>> The specification currently doesn't guarantee that the result is a jchar
>> array
>> that is zero terminated. So you can expect current runtimes not to do
>> this. As
>> Roman said at least JamaicaVM doesn't do this. I just checked the
>> implementations gcj and jamvm, they both also don't make any such
>> guarantee
>> (cacao does seem to add an extra 0 at the end of the result it returns
>> though).
>> So "clarifying the spec" would break a lot of code of currently
>> conforming
>> implementations. The code relying on this behavior seems to be just buggy
>> and
>> should be fixed imho.
> 
> The specification is buggy
> in that it does not take into account the operating system interface 
> and makes correct memory management inefficient 
> for the benefit of sparing one byte per buffer 
> where an OS call is not needed.
> Ridiculous.
> The developers at Sun 
> found the correct way to interpreting the specification; 
> the other ones followed it blindfolded.  It is now time to repent.
> 

Wrong!  Requiring null termimation will make things more inefficient.  This
is because Strings within Java are not null-terminated.  So to add the null
the VM will have to copy the String chars into a new buffer.

A more efficient approach is to simply return a pointer to the String chars
themselves.  However, this will not be null-terminated.

The JNI specification allows a VM to either copy the chars or return a
direct pointer.  The extra isCopy parameter can be used to find out what it
did.

The point is, if the programmer doesn't need a null-terminated string, not
copying is _much_ more efficient.  The programmer can always copy and add
the null if they need to.  But forcing the VM to null-terminate will require
a copy and slow it down it all cases.

If I was updating the spec, I would change it so that if a copy is returned
it is always null terminated.  If it isn't a copy then it may or may not be. 
It's likely no VMs will need changing, as I suspect the ones that do not
null-terminate are returning direct pointers (e.g. JamVM).

And I doubt Sun makes a copy because of the null.  Giving out direct heap
pointers causes problems for VMs that move objects within the heap (e.g. a
compacting GC).  Either you've got to "pin" the object so it can't move or
you always copy.  Sun probably chose the latter.  In JamVM, I decided to pin
the String (it's unpinned in ReleaseStringChars).

Rob.

P.S.  I hope your blindfold has been removed :) When implementing a VM few
things are as straight-forward as they may seem.

-- 
View this message in context: http://www.nabble.com/Null-terminated-Unicode-strings-in-java.io-on-Windows-tp15006673p15091812.html
Sent from the OpenJDK Core Libraries mailing list archive at Nabble.com.




More information about the core-libs-dev mailing list