Long sed commands
Fredrik Öhrström
oehrstroem at gmail.com
Sun Jul 15 02:49:43 PDT 2012
2012/7/13 Kelly O'Hair <kelly.ohair at oracle.com>:
>
> On Jul 12, 2012, at 9:26 AM, Mike Duigou wrote:
>
>> Does anyone have an explanation about why just these conversions are applied and a general translation similar to :
>>
>> \\u00\([0-9a-fA-F]{2,2}\) => \x\1
>>
>> can't be applied? Why can't the source file just be modified?
>
> I wondered about that too. The sources probably require the /u (property files), but we need \x if we want to process the
> text with Unix utilities? I'm just getting started understanding this stuff in detail.
It all goes back to the somewhat surprising stipulation that property
files have to be
encoded in iso8859, unless you load them in a special way. One
solution is to use
native2ascii and drop the sed expression in the makefile and
difftext.sh altogether.
Then we should start changing the actual sources for the properties
files all over the jdk,
but up until now the build-infra project has avoided sources changes/cleanups.
My suggestion is that the properties file sources in the openjdk
should be encoded into pure
7-bit ascii with \u escapes for everything, include the iso8869
characters. This way, you can easily
open up the property files in any editor and not need to rely on the
editors ability
to detect the actual encoding for you. If the sources were changed to
be encoded like this,
then they would work out of the box with the openjdk, and the makefile
could simply halt
on error if it detects non 7-bit pure ascii. No need for
cleaning/stripping of properties.
Mikes suggestion does not work since the \x09 are interpreted into a
binary values
when the sed script is loaded. You cannot create dynamic binary values
using sed.
And not only that, \x09 is gnu extension and does not work on bsd and
probably not
solaris either! Thus the short term solution is to replace it with native2ascii!
But starting native2ascii means starting a jvm for each file, thus the
preferable and
long term solution is to cleanup the property source files and not do
cleaning/stripping at all.
More background....
In the old build system, there are two StripProperties.java
in corba and in the jdk with slightly different implementations
and command line options. And there are two CompileProperties.java
in langtools and in the jdk. Also slightly different.
There is >also< a broken sed implementation of StripProperties
in the jdk, and sometimes the properties files are simply copied over.
The net result is that, property files in the old build are:
sometimes compiled to a bytecode class, and for those that remain text:
: is sometimes translated to \:
= is sometimes translated to \=
# comment is sometimes cleaned to only #, in a broken way, since #
inside text strings mutilates the text string.
# comment is sometimes removed completely.
\u00E9 is sometimes translated to binary E9, ie part of iso8859-1
translation, but not always.
The key values are sometimes sorted into a stable random sort.
This makes it hard for the compareimages.sh script (in particular difftext.sh)
to compare the old and the new build.
//Fredrik
More information about the build-infra-dev
mailing list