Fastpath for new String(bytes...) and String.getBytes(...)

Ulf Zibis Ulf.Zibis at gmx.de
Wed Mar 25 23:52:18 UTC 2009


Am 19.03.2009 20:02, Xueming Shen schrieb:
> Ulf Zibis wrote:
>>
>> Isn't there any way even to avoid instantiating new ..Array-X-coder 
>> for each invocation of StringCoding.x-code(Charset cs, ...)?
>> Method x-code(byte/char[]) seems to be threadsafe, if replacement 
>> isn't changed, so I suppose, we could cache the ..Array-X-coder.
>>
> no. an "external" charset can do whatever it likes, it might be still 
> the same "object", the de/encoder it "creates" might
> be still the same "object' or looks like the same object you might 
> have cahced,  but do total different thing.


At first assumption user could think, that String#getBytes(byte[] buf, 
Charset cs) might be faster than String#getBytes(byte[] buf, String 
csn), because he assumes, that Charset would be internally created from csn.
As this is only true for the first call, there should be a *note* in 
JavaDoc about cost of those methods in comparision. Don't forget (byte[] 
...) constructor's JavaDoc too.

Secondly I think, that ASCII and ISO-8859-1 have high percentage here 
especially for CORBA applications, so why not have a fast shortcut in 
class String without internally using Charset-X-coder like 
getASCIIbytes() + getISO_8859_1Bytes(), or more general and sophisticated:
    int getBytes(byte[] buf, byte mask) {
        int j = 0;
        for (int i=0; i<values.length; i++, j++) {
            if (values[i] | mask == mask)
                buf[j] = (byte)values[i];
                continue;
            if (isHighSurrogate(values[i])
                 i++;
            buf[j] = '?'; // or default replacement
        }
        return j;
    }

-Ulf





More information about the core-libs-dev mailing list