RFR: 8065138 - Encodings.isRecognizedEnconding sometimes fails to recognize 'UTF8'

Erik Joelsson erik.joelsson at oracle.com
Thu Nov 20 13:36:48 UTC 2014


Here is my proposal for fixing the particular issue of generating the 
correct properties files. I'm simply adding LC_ALL=C to the whole 
command line instead of just the sort at the end. It seems to require 
using "export" to get picked up.

Bug: https://bugs.openjdk.java.net/browse/JDK-8065138
Patch:
diff --git a/make/common/JavaCompilation.gmk 
b/make/common/JavaCompilation.gmk
--- a/make/common/JavaCompilation.gmk
+++ b/make/common/JavaCompilation.gmk
@@ -400,13 +400,15 @@
    # Now we can setup the depency that will trigger the copying.
    $$($1_BIN)$$($2_TARGET) : $2
      $(MKDIR) -p $$(@D)
-    $(CAT) $$< | $(SED) -e 's/\([^\\]\):/\1\\:/g' -e 
's/\([^\\]\)=/\1\\=/g' \
+    export LC_ALL=C ; $(CAT) $$< \
+        | $(SED) -e 's/\([^\\]\):/\1\\:/g' -e 's/\([^\\]\)=/\1\\=/g' \
              -e 's/\([^\\]\)!/\1\\!/g' -e 's/#.*/#/g' \
          | $(SED) -f "$(SRC_ROOT)/make/common/support/unicode2x.sed" \
          | $(SED) -e '/^#/d' -e '/^$$$$/d' \
              -e :a -e '/\\$$$$/N; s/\\\n//; ta' \
              -e 's/^[     ]*//;s/[     ]*$$$$//' \
-            -e 's/\\=/=/' | LC_ALL=C $(SORT) > $$@
+            -e 's/\\=/=/' \
+        | $(SORT) > $$@
      $(CHMOD) -f ug+w $$@

    # And do not forget this target


I filed a separate issue [1] for investigating the use of pipefail.

/Erik

[1] https://bugs.openjdk.java.net/browse/JDK-8065576

On 2014-11-20 10:34, Daniel Fuchs wrote:
> On 11/20/14 10:26 AM, Erik Joelsson wrote:
>> Hello,
>>
>> On 2014-11-20 02:20, Martin Buchholz wrote:
>>> Amusingly, the $(SORT) has an LC_ALL=C carefully placed before it, but
>>> the $(SED)s need it too!
>> Yes, I think that's the correct fix in this case.
>>> On Wed, Nov 19, 2014 at 5:18 PM, Martin Buchholz 
>>> <martinrb at google.com> wrote:
>>>> [+ build-dev]
>>>>
>>>> I think I see the problem.  By default, a Unix pipeline sadly fails
>>>> only when the last command fails.  In bash, you can change this to a
>>>> more sensible default via
>>>> set -o pipefail
>>>> but that's not portable enough for openjdk.
>> This is interesting and something I had missed. I will experiment 
>> with enabling pipefail if configure finds support for it. This will 
>> include setting the SHELL to bash. We really should fail instead of 
>> silently generating broken builds.
>>
>> Daniel, I can take over this bug if you want and work on a proper 
>> build fix.
>
> Thanks Erik! You are welcome!
> So the whole issue seems to be that my default setting is 
> LC_ALL=en_US.UTF-8
> whereas sed requires LC_ALL=C to work properly on these property files...
>
> When the test first failed I tried to rerun the test with LC_ALL=C - 
> with no success
> of course. But I never thought of rebuilding with LC_ALL=C :-(
>
> My apologies for the red herring :-(
>
> best regards
>
> -- daniel
>
>>
>> /Erik
>>>> $(CAT) $$< | $(SED) -e 's/\([^\\]\):/\1\\:/g' -e 
>>>> 's/\([^\\]\)=/\1\\=/g' \
>>>>         -e 's/\([^\\]\)!/\1\\!/g' -e 's/#.*/#/g' \
>>>>     | $(SED) -f "$(SRC_ROOT)/make/common/support/unicode2x.sed" \
>>>>     | $(SED) -e '/^#/d' -e '/^$$$$/d' \
>>>>         -e :a -e '/\\$$$$/N; s/\\\n//; ta' \
>>>>         -e 's/^[ ]*//;s/[ ]*$$$$//' \
>>>>         -e 's/\\=/=/' | LC_ALL=C $(SORT) > $$@
>>>>
>>>> On Wed, Nov 19, 2014 at 5:07 PM, huizhe wang 
>>>> <huizhe.wang at oracle.com> wrote:
>>>>> On 11/19/2014 4:09 PM, Mandy Chung wrote:
>>>>>>
>>>>>> On 11/19/2014 3:49 PM, Mandy Chung wrote:
>>>>>>>
>>>>>>> On 11/19/2014 12:50 PM, Daniel Fuchs wrote:
>>>>>>>> On 11/19/14 9:36 PM, Mandy Chung wrote:
>>>>>>>>> resources.jar will be gone when we move to the modular runtime 
>>>>>>>>> image
>>>>>>>>> (JEP 220 [1]).
>>>>>>>>> JDK-8065138 and JDK-8065365 will become non-issue in JDK 9.
>>>>>>>> Do you mean that the property files will no longer be stripped 
>>>>>>>> of their
>>>>>>>> comments?
>>>>>>>
>>>>>>> (sorry for my delay in reply as I was trying to get the number 
>>>>>>> of the
>>>>>>> resources in the modular image vs resources.jar but got 
>>>>>>> distracted.)
>>>>>>>
>>>>>>> The current version copies all bytes when generating the modular 
>>>>>>> image.
>>>>>>> This is a good question whether we should strip off the comments 
>>>>>>> when
>>>>>>> writing to the modular runtime image.   I think we should look 
>>>>>>> at the
>>>>>>> footprint and any performance saving and determine if we should 
>>>>>>> do the same
>>>>>>> in JDK 9.
>>>>>>>
>>>>>> I looked at the exploded image under 
>>>>>> BUILD_OUTPUTDIR/jdk/modules/java.xml
>>>>>> and found that the comments of Encodings.properties are stripped. 
>>>>>> I was
>>>>>> confused with the mention of resources.jar that I assume it was a 
>>>>>> step
>>>>>> stripping the comments before writing to resources.jar. This is 
>>>>>> still
>>>>>> an issue in jigsaw m2 I believe.
>>>>>>
>>>>>> Where does the build strip the comments?
>>>>>
>>>>> A previous issue was that the build process strips off anything 
>>>>> after '#'
>>>>> when copying properties files. In JDK8:
>>>>> jaxp/make/BuildJaxp.gmk:
>>>>> $(JAXP_OUTPUTDIR)/classes/%.properties: 
>>>>> $(JAXP_TOPDIR)/src/%.properties
>>>>>          $(MKDIR) -p $(@D)
>>>>>          $(RM) $@ $@.tmp
>>>>>          $(CAT) $< | LANG=C $(NAWK) '{ sub(/#.*$$/,"#"); print }' 
>>>>> > $@.tmp
>>>>>          $(MV) $@.tmp $@
>>>>>
>>>>> This was fixed in JDK 9.  The per-repo process was removed. It 
>>>>> looks like
>>>>> the properties processing process is now consolidated into
>>>>> make/common/JavaCompilation.gmk. So the issue Daniel found is new 
>>>>> (in terms
>>>>> of stripping). Search 'properties files' to see the macro.
>>>>>
>>>>> Joe
>>>>>
>>>>>> Mandy
>>>>>>
>>
>




More information about the build-dev mailing list