RFR: 8065138 - Encodings.isRecognizedEnconding sometimes fails to recognize 'UTF8'

Martin Buchholz martinrb at google.com
Wed Nov 19 23:14:50 UTC 2014


It's certainly annoying to write a 20-line pure Java program to
replace the sed one-liner, but we've had success doing this for other
build tools.  Write a java program that explicitly reads in the input
in ISO-8859-1 to strip the comments, and it will work correctly
forever.

The lazy alternative fix is to ensure that all calls to sed and
similar system tools are run with LC_ALL=C which will also fix the
problem.

On Wed, Nov 19, 2014 at 10:15 AM, Daniel Fuchs <daniel.fuchs at oracle.com> wrote:
> On 19/11/14 18:01, Martin Buchholz wrote:
>>
>> On Wed, Nov 19, 2014 at 3:17 AM, Daniel Fuchs <daniel.fuchs at oracle.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> Please find below a trivial fix for
>>>
>>> 8065138: Encodings.isRecognizedEnconding sometimes fails to
>>>           recognize 'UTF8'
>>> https://bugs.openjdk.java.net/browse/JDK-8065138
>>>
>>> webrev: http://cr.openjdk.java.net/~dfuchs/webrev_8065138/webrev.00/
>>>
>>> The root of the issue is with
>>>
>>> jaxp/src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/Encodings.properties
>>> It contains a special character 'å' which confuses the build
>>> system on Mavericks.
>>
>>
>> Isn't that a bug in the build system that really ought to be fixed?
>>
>> If properties files are to be stored as resources in jar files, they
>> should either be incorporated byte-for-byte identical, or they should
>> be decoded using ISO-8859-1 (as specified).  It may be best to leave
>> non-ASCII characters in the source files, as a "test" of the build
>> system and the jdk itself.
>
>
> Hmmm. If the character is indeed legal then you're right, fixing
> the build is probably a better idea.
>
> However the issue seems to be with using 'sed' over property files:
>
> If I simply do:
>
>   cat
> jaxp/src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/Encodings.properties
> | sed 's,x,x,g'
>
> on my machine, it balks with:
>
> sed: RE error: illegal byte sequence
>
> -- daniel
>
>
>>
>>> The Encodings.properties file ends up truncated in resources.jar - it
>>> contains only one line (the line before the special character was
>>> encountered).
>>> The fix is to replace the special character 'å' by its unicode
>>> representation \u00e5.
>>>
>>> best regards,
>>>
>>> -- daniel
>>>
>



More information about the core-libs-dev mailing list