RFR 8025003: Base64 should be less strict with padding

Xueming Shen xueming.shen at oracle.com
Wed Oct 23 22:42:27 UTC 2013


Hi,

The current spec and implementation of Base64 decoder [1] is standard/rfc
based, in which it interprets/decodes the ending padding character(s) correctly
if present. The ending padding character(s) is not requested (liberal), but if
present, the spec and implementation requests they MUST be encoded correctly,
any incorrect padding combination at the final unit (as listed below) is treated
as incorrect encoded base64 data and results in exception.

Patterns of possible incorrectly encoded padding final base64 unit are:

     xxxx =       unnecessary padding character at the end of encoded stream
     xxxx xx=     missing the last padding character
     xxxx xx=y    missing the last padding character, instead having a non-padding char

The feedback we got so far suggests that "incorrectly encoded padding unit"
might might be frequently observed in real world use scenario, especially in the
MIME/email world, it might be desired to just accept these incorrectly encoded
ending unit and decode the rest successfully without throwing an exception.

It is also suggested it might be more appropriate to rename
     Base64.getEncoder(int lineLength, byte[] sept)
to be
     Base64.getMimeEncoder(int, byte[]).

The proposed changes here are to

(1) rename the factory method for the customizable "mime" encoder to
     Base.getMimeEncoder(int, byte[]);

(2) change the spec/implementation for the "mime" decoder to be lenient when
     handing the padding character in the final unit (mime decoder itself is "lenient"
     already. Its spec requests any non-base64 character during encoding. And our
     existing decoder is liberal when there is no padding present at all)

Here is the webrev

http://cr.openjdk.java.net/~sherman/8025003/webrev/

thanks!
-Sherman

Btw, updated mime decoder stilll throws exception for pattern like "xxxx x=..." or
"xxxx x", in which the last unit only has one valid "byte"/6-bit data.
It's not sufficient to be decoded into a valid 8-bit/byte data.




More information about the core-libs-dev mailing list