Review/comment needed for the new public java.util.Base64 class

Fri Nov 30 22:01:09 UTC 2012

Hi Sherman,

is this ssue still open ?

-Ulf

Am 03.11.2012 21:33, schrieb Ulf Zibis:
> Am 30.10.2012 23:40, schrieb Xueming Shen:
>>> I'm "confused" about the order of xxcode() and Xxcoder.
>>> In other places e.g. charsets, we have de... before en..., which is also true alphabetical
>>>
>> should not be an issue. javadoc output should be in alphabetic order. If it is really
>> concerns you, I can do a copy/paste:-)
>
> Yes, it doesn't matter much, but would be a nice conform style, so for me "personally" I would 
> like the copy/paste:-)
>
>>> - What (should) happen(s), if lineSeparator.length == 0 ?
>> do nothing. expected?
>
> I would say, empty line separator should not be allowed, so you might check:
>      Objects.requireNonEmpty(lineSeparator);
>
>>>  603         static {
>>>  604             Arrays.fill(fromBase64, -1);
>>>  605             for (int i = 0; i < Encoder.toBase64.length; i++)
>>>  606                 fromBase64[Encoder.toBase64[i]] = i;
>>>  607             fromBase64['='] = -2;
>>>  608         }
>>> This causes the inner Encoder class to be loaded, even if not needed. So maybe move the toBase64 
>>> table to the outer class.
>>> Have you compared performance with fromBase64 as byte[] (including local 'b' in decode() as byte)?
>>
>> understood. 
>
> It seems you have mixed 2 tweaks to one. ;-) See additional paragraph at the end ...
>
>> but it appears the hotspot likes it the constant/fixed length lookup
>> table is inside encoder. 
> Yes, but see my suggestion 12 lines below.
>
>> Same as you suggestion in your previous email to use
>> String in source and expand it during runtime. It saves about 500 bytes but slows
>> the server vm.
>
> Are you really sure? As it only runs once at class init, JIT should not compile this code.
> Executing the bytecode to init the final int[], value for value, by interpreter should not be 
> faster as expanding a String source in a loop.
>
>> Those repeating lines of "isURL? ...." is also supposed to be a
>> performance tweak to eliminate the array boundary check in loop.
>
> Yep, so I was thinking about:
>  653         private final int[] base64;
>  654         private final boolean isMIME;
>  655
>  656         private Decoder(boolean isURL, boolean isMIME) {
>  657             base64 = isURL ? fromBase64URL : fromBase64;
>  658             this.isMIME = isMIME;
>  659         }
> .....
>  911         private int decodeBuffer(ByteBuffer src, ByteBuffer dst) {
>  912             int[] base64 = this.base64; // local copy for performance reasons (but maybe 
> superfluous if HotSpot is intelligent enough)
> or:
>  911         private int decodeBuffer(ByteBuffer src, ByteBuffer dst, int[] base64) {
> .....
>
>>> Why you mix the algorithms to check the padding? :
>>>  824                         if (b == -2) {   // padding byte
>>>  890                     if (b == '=') {
>>>
>> It is supposed reduce one "if" check for normal base64 character inside the
>> loop. I'm not that sure it really helps though.
>
>  925                     if (b == '=') {
> --> causes one additional "if" in the _main_ path of the loop, so should be slower for regular input
>
>  859                         if (b == -2) {   // padding byte
> --> causes one additional "if" in the _side_ path of the loop, so should only affect irregular input
> Maybe the following code is little faster as no loading of the constant '-2' is required:
>  858                     if ((b = base64[b]) < 0) {
>  859                         if (++b < 0) {   // padding byte '='
>
>
> Once again (the following was meant for the decode algorithm, not initialisation):
> Have you compared performance with fromBase64 as byte[] (including local 'b' in decode() as byte)?
> Retrieving the bytes by b = base64[x] then could be done without address-shift and smaller/faster 
> LoadB operations could be used by JIT. In an int[] table, the index offset must be shifted by 2 
> before. Maybe this doesn't directly affect the performance on x86/IA64 CPU, but wastes space in 
> CPU cache for other tasks as a side effect. On ARM architectures I imagine, directly addressing a 
> byte-stepped table could be faster than addressing a 4-byte-stepped table. At least the footprint 
> of the table would be smaller.
>
> Little nit:
> You could delete line 641 for conformity with 629.
>
> -Ulf
>
>