RFR: 8065138 - Encodings.isRecognizedEnconding sometimes fails to recognize 'UTF8'

Martin Buchholz martinrb at google.com
Fri Nov 21 21:40:07 UTC 2014


A high-level followup ...

Running most text-based OS tools, including sed and sort, is risky
because the user's encoding may be different from the encoding of
source files in the JDK (of course, this is particularly problematic
with properties files, which must be ISO-8859-1).  These tools will
probably work better when run with LC_ALL=C.  We should consider
changing the definition of SED and SORT to something like
$(ENV) LC_ALL=C sed

As for set -o pipefail, it is sad that every single pipeline in the
makefiles is vulnerable to failure to detect broken builds.  The
workaround in http://stackoverflow.com/questions/23079651/equivalent-of-pipefail-in-gnu-make
is unaesthetic.  It's enough to make you rewrite all your build tools
in java!

On Thu, Nov 20, 2014 at 2:07 PM, Mandy Chung <mandy.chung at oracle.com> wrote:
> Daniel,
>
> Your test change looks fine and good to push this change that should catch
> if similar regression occurs in the future.  I was tempted to suggest to
> have a dedicated test for the build-time modification to the properties
> file.  Maybe a low priority RFE.
>
> Mandy
>
>
> On 11/20/14 9:25 AM, Daniel Fuchs wrote:
>>
>> On 20/11/14 14:36, Erik Joelsson wrote:
>>>
>>> Here is my proposal for fixing the particular issue of generating the
>>> correct properties files. I'm simply adding LC_ALL=C to the whole
>>> command line instead of just the sort at the end. It seems to require
>>> using "export" to get picked up.
>>
>>
>> Hi Erik,
>>
>> Looks good to me.
>>
>> I have applied your patch (manually, because the copy/paste
>> seems to have eaten the tab characters away, which caused patch to
>> reject the diff) - and I confirm that the issue has disappeared.
>>
>> Thanks for solving this!
>>
>> Do you think it would be worth to commit this test modification
>> later on, in a followup Bug/RFE?
>>
>>
>> http://cr.openjdk.java.net/~dfuchs/webrev_8065138/webrev.00/jdk/test/javax/xml/jaxp/Encodings/CheckEncodingPropertiesFile.java.frames.html
>>
>> If so I will take care of it.
>>
>> best regards,
>>
>> -- daniel
>>
>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8065138
>>> Patch:
>>> diff --git a/make/common/JavaCompilation.gmk
>>> b/make/common/JavaCompilation.gmk
>>> --- a/make/common/JavaCompilation.gmk
>>> +++ b/make/common/JavaCompilation.gmk
>>> @@ -400,13 +400,15 @@
>>>     # Now we can setup the depency that will trigger the copying.
>>>     $$($1_BIN)$$($2_TARGET) : $2
>>>       $(MKDIR) -p $$(@D)
>>> -    $(CAT) $$< | $(SED) -e 's/\([^\\]\):/\1\\:/g' -e
>>> 's/\([^\\]\)=/\1\\=/g' \
>>> +    export LC_ALL=C ; $(CAT) $$< \
>>> +        | $(SED) -e 's/\([^\\]\):/\1\\:/g' -e 's/\([^\\]\)=/\1\\=/g' \
>>>               -e 's/\([^\\]\)!/\1\\!/g' -e 's/#.*/#/g' \
>>>           | $(SED) -f "$(SRC_ROOT)/make/common/support/unicode2x.sed" \
>>>           | $(SED) -e '/^#/d' -e '/^$$$$/d' \
>>>               -e :a -e '/\\$$$$/N; s/\\\n//; ta' \
>>>               -e 's/^[     ]*//;s/[     ]*$$$$//' \
>>> -            -e 's/\\=/=/' | LC_ALL=C $(SORT) > $$@
>>> +            -e 's/\\=/=/' \
>>> +        | $(SORT) > $$@
>>>       $(CHMOD) -f ug+w $$@
>>>
>>>     # And do not forget this target
>>>
>>>
>>> I filed a separate issue [1] for investigating the use of pipefail.
>>>
>>> /Erik
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8065576
>>>
>>> On 2014-11-20 10:34, Daniel Fuchs wrote:
>>>>
>>>> On 11/20/14 10:26 AM, Erik Joelsson wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> On 2014-11-20 02:20, Martin Buchholz wrote:
>>>>>>
>>>>>> Amusingly, the $(SORT) has an LC_ALL=C carefully placed before it, but
>>>>>> the $(SED)s need it too!
>>>>>
>>>>> Yes, I think that's the correct fix in this case.
>>>>>>
>>>>>> On Wed, Nov 19, 2014 at 5:18 PM, Martin Buchholz
>>>>>> <martinrb at google.com> wrote:
>>>>>>>
>>>>>>> [+ build-dev]
>>>>>>>
>>>>>>> I think I see the problem.  By default, a Unix pipeline sadly fails
>>>>>>> only when the last command fails.  In bash, you can change this to a
>>>>>>> more sensible default via
>>>>>>> set -o pipefail
>>>>>>> but that's not portable enough for openjdk.
>>>>>
>>>>> This is interesting and something I had missed. I will experiment
>>>>> with enabling pipefail if configure finds support for it. This will
>>>>> include setting the SHELL to bash. We really should fail instead of
>>>>> silently generating broken builds.
>>>>>
>>>>> Daniel, I can take over this bug if you want and work on a proper
>>>>> build fix.
>>>>
>>>>
>>>> Thanks Erik! You are welcome!
>>>> So the whole issue seems to be that my default setting is
>>>> LC_ALL=en_US.UTF-8
>>>> whereas sed requires LC_ALL=C to work properly on these property
>>>> files...
>>>>
>>>> When the test first failed I tried to rerun the test with LC_ALL=C -
>>>> with no success
>>>> of course. But I never thought of rebuilding with LC_ALL=C :-(
>>>>
>>>> My apologies for the red herring :-(
>>>>
>>>> best regards
>>>>
>>>> -- daniel
>>>>
>>>>>
>>>>> /Erik
>>>>>>>
>>>>>>> $(CAT) $$< | $(SED) -e 's/\([^\\]\):/\1\\:/g' -e
>>>>>>> 's/\([^\\]\)=/\1\\=/g' \
>>>>>>>         -e 's/\([^\\]\)!/\1\\!/g' -e 's/#.*/#/g' \
>>>>>>>     | $(SED) -f "$(SRC_ROOT)/make/common/support/unicode2x.sed" \
>>>>>>>     | $(SED) -e '/^#/d' -e '/^$$$$/d' \
>>>>>>>         -e :a -e '/\\$$$$/N; s/\\\n//; ta' \
>>>>>>>         -e 's/^[ ]*//;s/[ ]*$$$$//' \
>>>>>>>         -e 's/\\=/=/' | LC_ALL=C $(SORT) > $$@
>>>>>>>
>>>>>>> On Wed, Nov 19, 2014 at 5:07 PM, huizhe wang
>>>>>>> <huizhe.wang at oracle.com> wrote:
>>>>>>>>
>>>>>>>> On 11/19/2014 4:09 PM, Mandy Chung wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/19/2014 3:49 PM, Mandy Chung wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/19/2014 12:50 PM, Daniel Fuchs wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 11/19/14 9:36 PM, Mandy Chung wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> resources.jar will be gone when we move to the modular runtime
>>>>>>>>>>>> image
>>>>>>>>>>>> (JEP 220 [1]).
>>>>>>>>>>>> JDK-8065138 and JDK-8065365 will become non-issue in JDK 9.
>>>>>>>>>>>
>>>>>>>>>>> Do you mean that the property files will no longer be stripped
>>>>>>>>>>> of their
>>>>>>>>>>> comments?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> (sorry for my delay in reply as I was trying to get the number
>>>>>>>>>> of the
>>>>>>>>>> resources in the modular image vs resources.jar but got
>>>>>>>>>> distracted.)
>>>>>>>>>>
>>>>>>>>>> The current version copies all bytes when generating the modular
>>>>>>>>>> image.
>>>>>>>>>> This is a good question whether we should strip off the comments
>>>>>>>>>> when
>>>>>>>>>> writing to the modular runtime image.   I think we should look
>>>>>>>>>> at the
>>>>>>>>>> footprint and any performance saving and determine if we should
>>>>>>>>>> do the same
>>>>>>>>>> in JDK 9.
>>>>>>>>>>
>>>>>>>>> I looked at the exploded image under
>>>>>>>>> BUILD_OUTPUTDIR/jdk/modules/java.xml
>>>>>>>>> and found that the comments of Encodings.properties are stripped.
>>>>>>>>> I was
>>>>>>>>> confused with the mention of resources.jar that I assume it was a
>>>>>>>>> step
>>>>>>>>> stripping the comments before writing to resources.jar. This is
>>>>>>>>> still
>>>>>>>>> an issue in jigsaw m2 I believe.
>>>>>>>>>
>>>>>>>>> Where does the build strip the comments?
>>>>>>>>
>>>>>>>>
>>>>>>>> A previous issue was that the build process strips off anything
>>>>>>>> after '#'
>>>>>>>> when copying properties files. In JDK8:
>>>>>>>> jaxp/make/BuildJaxp.gmk:
>>>>>>>> $(JAXP_OUTPUTDIR)/classes/%.properties:
>>>>>>>> $(JAXP_TOPDIR)/src/%.properties
>>>>>>>>          $(MKDIR) -p $(@D)
>>>>>>>>          $(RM) $@ $@.tmp
>>>>>>>>          $(CAT) $< | LANG=C $(NAWK) '{ sub(/#.*$$/,"#"); print }'
>>>>>>>> > $@.tmp
>>>>>>>>          $(MV) $@.tmp $@
>>>>>>>>
>>>>>>>> This was fixed in JDK 9.  The per-repo process was removed. It
>>>>>>>> looks like
>>>>>>>> the properties processing process is now consolidated into
>>>>>>>> make/common/JavaCompilation.gmk. So the issue Daniel found is new
>>>>>>>> (in terms
>>>>>>>> of stripping). Search 'properties files' to see the macro.
>>>>>>>>
>>>>>>>> Joe
>>>>>>>>
>>>>>>>>> Mandy
>>>>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the core-libs-dev mailing list