Does Java.g [version 1.0.6] handle unicode characters?
Roberto Mannai
robermann at gmail.com
Sat Aug 28 05:12:57 PDT 2010
So are you saying that if I have the following source file:
public class TestUnicode {
public static void test (String[] args){
char c = '\u0096';
}
}
which of course compiles, Java.g is not supposed to parse it? Now it
does not work.
Please note that with the version 1.0.5 it was handled correctly (I'm
doing some regression tests for understand whether migrate to v. 1.0.6
or not).
Thanks for your suggestions,
Roberto
On Sat, Aug 28, 2010 at 1:11 PM, Yang Jiang <yang.jiang.z at gmail.com> wrote:
> I looked at the grammar again and there is a misunderstanding here.
>
> If you read the grammar carefully, the STRINGLITERAL part, you'll notice
> the grammar is never supposed to handle input like '\unnnn'.
>
> What happens when you parse a java program is that the input first is fed
> into a tokenizer, then tokenizer emit tokens to the parser. Inputs like
> '\unnnn' are transformed to corresponding characters before feeding into the
> parser, in the tokenizer or even before the tokenizer. This is how the Sun
> javac parser does it, and Java.g is designed to act this way too. If you are
> going to use Java.g and handle inputs like '\unnnn' you'll have to implement
> this process yourself.
>
> I guess you are using antlrworks to test the grammar, but antlrworks won't
> do the transformation for you that's why you get the error.
>
> Then what does this mean and why it's put like this?
>
> Won't pass input containing unicode sequence like this
> char c = '\uffff'
> String s = "\uffff";
>
>
> '\uffff' is like any other chars say, 'a', 'b', '"' only there is no visual
> representation. But it can still be present in any java files.
> What this section really means is that Antlr can not handle the character
> represented by '\uffff'.
>
> Hope this solves your problem.
>
> y
>
>
>
>
> On 08/28/2010 06:04 PM, Roberto Mannai wrote:
>>
>> Yes, it does not work with '\u0096'. So am I supposed to (re)open a bug?
>> Where?
>>
>> On Sat, Aug 28, 2010 at 9:42 AM, Yang Jiang<yang.jiang.z at gmail.com>
>> wrote:
>>
>>>
>>> You can change '\uffff' to some other valid chars like '\u0096' etc..
>>> If that works, then looks like the problem gets back in Antlr 3.2.
>>>
>>>
>>> yang
>>>
>>> On 08/28/2010 03:19 PM, Roberto Mannai wrote:
>>>
>>>>
>>>> Hello
>>>>
>>>> [I sent the following message to the antlr-interest mailing list,
>>>> sorry for the cross posting]
>>>>
>>>> I'm trying to understand whether the Java grammar from
>>>> http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g
>>>> processes correctly the Unicode chars or not.
>>>>
>>>> In the file's header I read:
>>>> <<
>>>> * Know problems:
>>>> * Won't pass input containing unicode sequence like this
>>>> * char c = '\uffff'
>>>> * String s = "\uffff";
>>>> * Because Antlr does not treat '\uffff' as an valid char. This
>>>> will be fixed in the next Antlr
>>>> * release. [Fixed in Antlr-3.1.1]
>>>>
>>>>
>>>>>>
>>>>>>
>>>>
>>>> So, it seems that antlr 3.2 should handle the Unicode charset. Anyway,
>>>> when I try to parse the following class:
>>>>
>>>> public class TestUnicode {
>>>> public static void test (String[] args){
>>>> char c = '\uffff';
>>>> }
>>>> }
>>>>
>>>> I get the following error:
>>>> line 3:27 no viable alternative at character 'u'
>>>> line 3:34 mismatched character '\r' expecting '''
>>>> line 1:7 mismatched input 'class' expecting MONKEYS_AT
>>>> line 2:22 mismatched input 'void' expecting MONKEYS_AT
>>>> line 3:21 mismatched input 'c' expecting DOT
>>>> line 3:23 no viable alternative at input '='
>>>> line 4:8 no viable alternative at input '}'
>>>> line 4:8 no viable alternative at input '}'
>>>>
>>>> If I replace the unicode character it of course works. Am I missing
>>>> anything? Please note that version 1.0.5 didn't have this problem.
>>>>
>>>> Thanks for your help.
>>>>
>>>> Roberto
>>>>
>>>>
>>>
>>>
>
>
More information about the compiler-grammar-dev
mailing list