<i18n dev> RL1.2 Properties (part 1 of 2)

Tom Christiansen tchrist at perl.com
Sun Jan 23 00:18:56 PST 2011


Sherman wrote:

> The Unicode/java version of lowercase, uppercase, withespace
> and letter character classes are provided via \p{javaXYZ},

I'm afraid that is *not* true; please see part 2.

> and the \p{Lower/Upper/Alpha/Space} are specified/implemented
> for POSIX version, which is clearly documented in the API
> document. I would not use "worst" for this. I don't think the
> "conformance" requests the implementation to use exactly the
> name specified in standard.

> The following classes/properties are actually
> supported/implemented, while only the \p{javaLowerCase},
> \p{javaUpperCase}, \p{javaWhitespace} and \p{javaMirrored} are
> explicitly documented in Pattern API, the rest are covered by
> notes as "Categories that behave like the java.lang.Character
> boolean ismethodname methods are available through the same
> \p{prop} syntax..."

>     \p{javaLowerCase}
>     \p{javaUpperCase}
>     \p{javaTitleCase}
>     \p{javaDigit}
>     \p{javaDefined}
>     \p{javaLetter}
>     \p{javaLetterOrDigit}
>     \p{javaJavaIdentifierStart}
>     \p{javaJavaIdentifierPart}
>     \p{javaUnicodeIdentifierStart}
>     \p{javaUnicodeIdentifierPart}
>     \p{javaIdentifierIgnorable}
>     \p{javaSpaceChar}
>     \p{javaWhitespace}
>     \p{javaISOControl}
>     \p{javaMirrored}

Last I checked there was also a \p{javaJavaIdentifierPart}, which is
pretty silly. I think.

> It appears the "noncharacter_cp and "default_ignorable_cp" are
> missing from the list, will take a look later, but I guess
> these 2 are really not that "significant".

They are two of the eleven properties which must be supported to
meet RL1.2 compliance, and therefore Level 1 compliance.

Having access to the real Unicode properties is more important
than having these java versions, which don't work right.

See part 2, please.

--tom



More information about the i18n-dev mailing list