RFR: 8065138 - Encodings.isRecognizedEnconding sometimes fails to recognize 'UTF8'

Daniel Fuchs daniel.fuchs at oracle.com
Mon Nov 24 11:00:59 UTC 2014


On 24/11/14 11:44, Erik Joelsson wrote:
> Hello Daniel,
>
> Test seems like a great idea. Thanks!

OK - I have logged JDK-8065748
https://bugs.openjdk.java.net/browse/JDK-8065748

I'll send a patch for review when your fix is in :-)

Thanks!
-- daniel

>
> /Erik
>
> On 2014-11-20 18:25, Daniel Fuchs wrote:
>> On 20/11/14 14:36, Erik Joelsson wrote:
>>> Here is my proposal for fixing the particular issue of generating the
>>> correct properties files. I'm simply adding LC_ALL=C to the whole
>>> command line instead of just the sort at the end. It seems to require
>>> using "export" to get picked up.
>>
>> Hi Erik,
>>
>> Looks good to me.
>>
>> I have applied your patch (manually, because the copy/paste
>> seems to have eaten the tab characters away, which caused patch to
>> reject the diff) - and I confirm that the issue has disappeared.
>>
>> Thanks for solving this!
>>
>> Do you think it would be worth to commit this test modification
>> later on, in a followup Bug/RFE?
>>
>> http://cr.openjdk.java.net/~dfuchs/webrev_8065138/webrev.00/jdk/test/javax/xml/jaxp/Encodings/CheckEncodingPropertiesFile.java.frames.html
>>
>>
>> If so I will take care of it.
>>
>> best regards,
>>
>> -- daniel
>>
>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8065138
>>> Patch:
>>> diff --git a/make/common/JavaCompilation.gmk
>>> b/make/common/JavaCompilation.gmk
>>> --- a/make/common/JavaCompilation.gmk
>>> +++ b/make/common/JavaCompilation.gmk
>>> @@ -400,13 +400,15 @@
>>>     # Now we can setup the depency that will trigger the copying.
>>>     $$($1_BIN)$$($2_TARGET) : $2
>>>       $(MKDIR) -p $$(@D)
>>> -    $(CAT) $$< | $(SED) -e 's/\([^\\]\):/\1\\:/g' -e
>>> 's/\([^\\]\)=/\1\\=/g' \
>>> +    export LC_ALL=C ; $(CAT) $$< \
>>> +        | $(SED) -e 's/\([^\\]\):/\1\\:/g' -e 's/\([^\\]\)=/\1\\=/g' \
>>>               -e 's/\([^\\]\)!/\1\\!/g' -e 's/#.*/#/g' \
>>>           | $(SED) -f "$(SRC_ROOT)/make/common/support/unicode2x.sed" \
>>>           | $(SED) -e '/^#/d' -e '/^$$$$/d' \
>>>               -e :a -e '/\\$$$$/N; s/\\\n//; ta' \
>>>               -e 's/^[     ]*//;s/[     ]*$$$$//' \
>>> -            -e 's/\\=/=/' | LC_ALL=C $(SORT) > $$@
>>> +            -e 's/\\=/=/' \
>>> +        | $(SORT) > $$@
>>>       $(CHMOD) -f ug+w $$@
>>>
>>>     # And do not forget this target
>>>
>>>
>>> I filed a separate issue [1] for investigating the use of pipefail.
>>>
>>> /Erik
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8065576
>>>
>>> On 2014-11-20 10:34, Daniel Fuchs wrote:
>>>> On 11/20/14 10:26 AM, Erik Joelsson wrote:
>>>>> Hello,
>>>>>
>>>>> On 2014-11-20 02:20, Martin Buchholz wrote:
>>>>>> Amusingly, the $(SORT) has an LC_ALL=C carefully placed before it,
>>>>>> but
>>>>>> the $(SED)s need it too!
>>>>> Yes, I think that's the correct fix in this case.
>>>>>> On Wed, Nov 19, 2014 at 5:18 PM, Martin Buchholz
>>>>>> <martinrb at google.com> wrote:
>>>>>>> [+ build-dev]
>>>>>>>
>>>>>>> I think I see the problem.  By default, a Unix pipeline sadly fails
>>>>>>> only when the last command fails.  In bash, you can change this to a
>>>>>>> more sensible default via
>>>>>>> set -o pipefail
>>>>>>> but that's not portable enough for openjdk.
>>>>> This is interesting and something I had missed. I will experiment
>>>>> with enabling pipefail if configure finds support for it. This will
>>>>> include setting the SHELL to bash. We really should fail instead of
>>>>> silently generating broken builds.
>>>>>
>>>>> Daniel, I can take over this bug if you want and work on a proper
>>>>> build fix.
>>>>
>>>> Thanks Erik! You are welcome!
>>>> So the whole issue seems to be that my default setting is
>>>> LC_ALL=en_US.UTF-8
>>>> whereas sed requires LC_ALL=C to work properly on these property
>>>> files...
>>>>
>>>> When the test first failed I tried to rerun the test with LC_ALL=C -
>>>> with no success
>>>> of course. But I never thought of rebuilding with LC_ALL=C :-(
>>>>
>>>> My apologies for the red herring :-(
>>>>
>>>> best regards
>>>>
>>>> -- daniel
>>>>
>>>>>
>>>>> /Erik
>>>>>>> $(CAT) $$< | $(SED) -e 's/\([^\\]\):/\1\\:/g' -e
>>>>>>> 's/\([^\\]\)=/\1\\=/g' \
>>>>>>>         -e 's/\([^\\]\)!/\1\\!/g' -e 's/#.*/#/g' \
>>>>>>>     | $(SED) -f "$(SRC_ROOT)/make/common/support/unicode2x.sed" \
>>>>>>>     | $(SED) -e '/^#/d' -e '/^$$$$/d' \
>>>>>>>         -e :a -e '/\\$$$$/N; s/\\\n//; ta' \
>>>>>>>         -e 's/^[ ]*//;s/[ ]*$$$$//' \
>>>>>>>         -e 's/\\=/=/' | LC_ALL=C $(SORT) > $$@
>>>>>>>
>>>>>>> On Wed, Nov 19, 2014 at 5:07 PM, huizhe wang
>>>>>>> <huizhe.wang at oracle.com> wrote:
>>>>>>>> On 11/19/2014 4:09 PM, Mandy Chung wrote:
>>>>>>>>>
>>>>>>>>> On 11/19/2014 3:49 PM, Mandy Chung wrote:
>>>>>>>>>>
>>>>>>>>>> On 11/19/2014 12:50 PM, Daniel Fuchs wrote:
>>>>>>>>>>> On 11/19/14 9:36 PM, Mandy Chung wrote:
>>>>>>>>>>>> resources.jar will be gone when we move to the modular runtime
>>>>>>>>>>>> image
>>>>>>>>>>>> (JEP 220 [1]).
>>>>>>>>>>>> JDK-8065138 and JDK-8065365 will become non-issue in JDK 9.
>>>>>>>>>>> Do you mean that the property files will no longer be stripped
>>>>>>>>>>> of their
>>>>>>>>>>> comments?
>>>>>>>>>>
>>>>>>>>>> (sorry for my delay in reply as I was trying to get the number
>>>>>>>>>> of the
>>>>>>>>>> resources in the modular image vs resources.jar but got
>>>>>>>>>> distracted.)
>>>>>>>>>>
>>>>>>>>>> The current version copies all bytes when generating the modular
>>>>>>>>>> image.
>>>>>>>>>> This is a good question whether we should strip off the comments
>>>>>>>>>> when
>>>>>>>>>> writing to the modular runtime image.   I think we should look
>>>>>>>>>> at the
>>>>>>>>>> footprint and any performance saving and determine if we should
>>>>>>>>>> do the same
>>>>>>>>>> in JDK 9.
>>>>>>>>>>
>>>>>>>>> I looked at the exploded image under
>>>>>>>>> BUILD_OUTPUTDIR/jdk/modules/java.xml
>>>>>>>>> and found that the comments of Encodings.properties are stripped.
>>>>>>>>> I was
>>>>>>>>> confused with the mention of resources.jar that I assume it was a
>>>>>>>>> step
>>>>>>>>> stripping the comments before writing to resources.jar. This is
>>>>>>>>> still
>>>>>>>>> an issue in jigsaw m2 I believe.
>>>>>>>>>
>>>>>>>>> Where does the build strip the comments?
>>>>>>>>
>>>>>>>> A previous issue was that the build process strips off anything
>>>>>>>> after '#'
>>>>>>>> when copying properties files. In JDK8:
>>>>>>>> jaxp/make/BuildJaxp.gmk:
>>>>>>>> $(JAXP_OUTPUTDIR)/classes/%.properties:
>>>>>>>> $(JAXP_TOPDIR)/src/%.properties
>>>>>>>>          $(MKDIR) -p $(@D)
>>>>>>>>          $(RM) $@ $@.tmp
>>>>>>>>          $(CAT) $< | LANG=C $(NAWK) '{ sub(/#.*$$/,"#"); print }'
>>>>>>>> > $@.tmp
>>>>>>>>          $(MV) $@.tmp $@
>>>>>>>>
>>>>>>>> This was fixed in JDK 9.  The per-repo process was removed. It
>>>>>>>> looks like
>>>>>>>> the properties processing process is now consolidated into
>>>>>>>> make/common/JavaCompilation.gmk. So the issue Daniel found is new
>>>>>>>> (in terms
>>>>>>>> of stripping). Search 'properties files' to see the macro.
>>>>>>>>
>>>>>>>> Joe
>>>>>>>>
>>>>>>>>> Mandy
>>>>>>>>>
>>>>>
>>>>
>>>
>>
>




More information about the build-dev mailing list