From robermann at gmail.com  Sat Aug 28 00:18:59 2010
From: robermann at gmail.com (Roberto Mannai)
Date: Sat, 28 Aug 2010 09:18:59 +0200
Subject: Java.g [version 1.0.6]: are now non-Javadoc comments suppressed?
Message-ID: <AANLkTikYGRi7EzNsJTP--XQqdWPvCO0Xz8ikE4GQ81hq@mail.gmail.com>

Hi all

[I sent the following message to the antlr-interest mailing list,
sorry for the cross posting]

Maybe I'm overlooking something, but it seems to me that the Java.g
v.1.0.6 grammar (
http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g )
does not emit the standard Java comments  ( /* COMMENT */ ).

When I process the following file, with antlr 3.2:

/** Is this comment returned?*/
public class TestComment {
       /* Is this comment returned?*/
}

I get:
     /** Is this comment returned?*/publicclassTestComment{}

One "previous version" (*) returned:
     /** Is this comment returned?*/
     public class TestComment {
             /* Is this comment returned?*/
     }


Is this behaviour really changed or am I missing/forgetting anything?

Thanks,

Roberto

(*) The previous version was based on Java.g v.1.0.5 (it contains also
some template calls, anyway):
http://codesounding.svn.sourceforge.net/viewvc/codesounding/CodeSounding/trunk/src/codesounding/antlr/JavaRewrite.g?view=markup

From robermann at gmail.com  Sat Aug 28 00:19:20 2010
From: robermann at gmail.com (Roberto Mannai)
Date: Sat, 28 Aug 2010 09:19:20 +0200
Subject: Does Java.g [version 1.0.6] handle unicode characters?
Message-ID: <AANLkTimqv8xvp4ue4jzUoVbNn01HsH-y2jQhUgv9o4BA@mail.gmail.com>

Hello

[I sent the following message to the antlr-interest mailing list,
sorry for the cross posting]

I'm trying to understand whether the Java grammar from
http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g
processes correctly the Unicode chars or not.

In the file's header I read:
<<
 *  Know problems:
 *    Won't pass input containing unicode sequence like this
 *      char c = '\uffff'
 *      String s = "\uffff";
 *    Because Antlr does not treat '\uffff' as an valid char. This
will be fixed in the next Antlr
 *    release. [Fixed in Antlr-3.1.1]
>>

So, it seems that antlr 3.2 should handle the Unicode charset. Anyway,
when I try to parse the following class:

public class TestUnicode {
        public static void test (String[] args){
                char c = '\uffff';
        }
}

I get the following error:
     line 3:27 no viable alternative at character 'u'
     line 3:34 mismatched character '\r' expecting '''
     line 1:7 mismatched input 'class' expecting MONKEYS_AT
     line 2:22 mismatched input 'void' expecting MONKEYS_AT
     line 3:21 mismatched input 'c' expecting DOT
     line 3:23 no viable alternative at input '='
     line 4:8 no viable alternative at input '}'
     line 4:8 no viable alternative at input '}'

If I replace the unicode character it of course works. Am I missing
anything? Please note that version 1.0.5 didn't have this problem.

Thanks for your help.

Roberto

From yang.jiang.z at gmail.com  Sat Aug 28 00:38:09 2010
From: yang.jiang.z at gmail.com (Yang Jiang)
Date: Sat, 28 Aug 2010 15:38:09 +0800
Subject: Java.g [version 1.0.6]: are now non-Javadoc comments suppressed?
In-Reply-To: <AANLkTikYGRi7EzNsJTP--XQqdWPvCO0Xz8ikE4GQ81hq@mail.gmail.com>
References: <AANLkTikYGRi7EzNsJTP--XQqdWPvCO0Xz8ikE4GQ81hq@mail.gmail.com>
Message-ID: <4C78BCE1.3010209@gmail.com>

What you got is right, standard comment, line comment ( "// coment" ) 
and white spaces are skipped.

Just search for "skip" in java.g and you'll see.

yang

On 08/28/2010 03:18 PM, Roberto Mannai wrote:
> Hi all
>
> [I sent the following message to the antlr-interest mailing list,
> sorry for the cross posting]
>
> Maybe I'm overlooking something, but it seems to me that the Java.g
> v.1.0.6 grammar (
> http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g )
> does not emit the standard Java comments  ( /* COMMENT */ ).
>
> When I process the following file, with antlr 3.2:
>
> /** Is this comment returned?*/
> public class TestComment {
>         /* Is this comment returned?*/
> }
>
> I get:
>       /** Is this comment returned?*/publicclassTestComment{}
>
> One "previous version" (*) returned:
>       /** Is this comment returned?*/
>       public class TestComment {
>               /* Is this comment returned?*/
>       }
>
>
> Is this behaviour really changed or am I missing/forgetting anything?
>
> Thanks,
>
> Roberto
>
> (*) The previous version was based on Java.g v.1.0.5 (it contains also
> some template calls, anyway):
> http://codesounding.svn.sourceforge.net/viewvc/codesounding/CodeSounding/trunk/src/codesounding/antlr/JavaRewrite.g?view=markup
>    


From yang.jiang.z at gmail.com  Sat Aug 28 00:42:19 2010
From: yang.jiang.z at gmail.com (Yang Jiang)
Date: Sat, 28 Aug 2010 15:42:19 +0800
Subject: Does Java.g [version 1.0.6] handle unicode characters?
In-Reply-To: <AANLkTimqv8xvp4ue4jzUoVbNn01HsH-y2jQhUgv9o4BA@mail.gmail.com>
References: <AANLkTimqv8xvp4ue4jzUoVbNn01HsH-y2jQhUgv9o4BA@mail.gmail.com>
Message-ID: <4C78BDDB.8050302@gmail.com>

You can change '\uffff' to some other valid chars like '\u0096' etc..
If that works, then looks like the problem gets back in Antlr 3.2.


yang

On 08/28/2010 03:19 PM, Roberto Mannai wrote:
> Hello
>
> [I sent the following message to the antlr-interest mailing list,
> sorry for the cross posting]
>
> I'm trying to understand whether the Java grammar from
> http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g
> processes correctly the Unicode chars or not.
>
> In the file's header I read:
> <<
>   *  Know problems:
>   *    Won't pass input containing unicode sequence like this
>   *      char c = '\uffff'
>   *      String s = "\uffff";
>   *    Because Antlr does not treat '\uffff' as an valid char. This
> will be fixed in the next Antlr
>   *    release. [Fixed in Antlr-3.1.1]
>    
>>>        
> So, it seems that antlr 3.2 should handle the Unicode charset. Anyway,
> when I try to parse the following class:
>
> public class TestUnicode {
>          public static void test (String[] args){
>                  char c = '\uffff';
>          }
> }
>
> I get the following error:
>       line 3:27 no viable alternative at character 'u'
>       line 3:34 mismatched character '\r' expecting '''
>       line 1:7 mismatched input 'class' expecting MONKEYS_AT
>       line 2:22 mismatched input 'void' expecting MONKEYS_AT
>       line 3:21 mismatched input 'c' expecting DOT
>       line 3:23 no viable alternative at input '='
>       line 4:8 no viable alternative at input '}'
>       line 4:8 no viable alternative at input '}'
>
> If I replace the unicode character it of course works. Am I missing
> anything? Please note that version 1.0.5 didn't have this problem.
>
> Thanks for your help.
>
> Roberto
>    


From robermann at gmail.com  Sat Aug 28 03:04:32 2010
From: robermann at gmail.com (Roberto Mannai)
Date: Sat, 28 Aug 2010 12:04:32 +0200
Subject: Does Java.g [version 1.0.6] handle unicode characters?
In-Reply-To: <4C78BDDB.8050302@gmail.com>
References: <AANLkTimqv8xvp4ue4jzUoVbNn01HsH-y2jQhUgv9o4BA@mail.gmail.com>
	<4C78BDDB.8050302@gmail.com>
Message-ID: <AANLkTinkTtcU_VmF6A2Roe3Wd1siFjkmWo+8tD-RtWZz@mail.gmail.com>

Yes, it does not work with '\u0096'. So am I supposed to (re)open a bug? Where?

On Sat, Aug 28, 2010 at 9:42 AM, Yang Jiang <yang.jiang.z at gmail.com> wrote:
> You can change '\uffff' to some other valid chars like '\u0096' etc..
> If that works, then looks like the problem gets back in Antlr 3.2.
>
>
> yang
>
> On 08/28/2010 03:19 PM, Roberto Mannai wrote:
>>
>> Hello
>>
>> [I sent the following message to the antlr-interest mailing list,
>> sorry for the cross posting]
>>
>> I'm trying to understand whether the Java grammar from
>> http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g
>> processes correctly the Unicode chars or not.
>>
>> In the file's header I read:
>> <<
>> ?* ?Know problems:
>> ?* ? ?Won't pass input containing unicode sequence like this
>> ?* ? ? ?char c = '\uffff'
>> ?* ? ? ?String s = "\uffff";
>> ?* ? ?Because Antlr does not treat '\uffff' as an valid char. This
>> will be fixed in the next Antlr
>> ?* ? ?release. [Fixed in Antlr-3.1.1]
>>
>>>>
>>>>
>>
>> So, it seems that antlr 3.2 should handle the Unicode charset. Anyway,
>> when I try to parse the following class:
>>
>> public class TestUnicode {
>> ? ? ? ? public static void test (String[] args){
>> ? ? ? ? ? ? ? ? char c = '\uffff';
>> ? ? ? ? }
>> }
>>
>> I get the following error:
>> ? ? ?line 3:27 no viable alternative at character 'u'
>> ? ? ?line 3:34 mismatched character '\r' expecting '''
>> ? ? ?line 1:7 mismatched input 'class' expecting MONKEYS_AT
>> ? ? ?line 2:22 mismatched input 'void' expecting MONKEYS_AT
>> ? ? ?line 3:21 mismatched input 'c' expecting DOT
>> ? ? ?line 3:23 no viable alternative at input '='
>> ? ? ?line 4:8 no viable alternative at input '}'
>> ? ? ?line 4:8 no viable alternative at input '}'
>>
>> If I replace the unicode character it of course works. Am I missing
>> anything? Please note that version 1.0.5 didn't have this problem.
>>
>> Thanks for your help.
>>
>> Roberto
>>
>
>

From yang.jiang.z at gmail.com  Sat Aug 28 04:11:27 2010
From: yang.jiang.z at gmail.com (Yang Jiang)
Date: Sat, 28 Aug 2010 19:11:27 +0800
Subject: Does Java.g [version 1.0.6] handle unicode characters?
In-Reply-To: <AANLkTinkTtcU_VmF6A2Roe3Wd1siFjkmWo+8tD-RtWZz@mail.gmail.com>
References: <AANLkTimqv8xvp4ue4jzUoVbNn01HsH-y2jQhUgv9o4BA@mail.gmail.com>	<4C78BDDB.8050302@gmail.com>
	<AANLkTinkTtcU_VmF6A2Roe3Wd1siFjkmWo+8tD-RtWZz@mail.gmail.com>
Message-ID: <4C78EEDF.1030102@gmail.com>

I looked at the grammar again and there is a misunderstanding here.

If you read the grammar carefully, the STRINGLITERAL part,  you'll 
notice the grammar is never supposed to handle input like '\unnnn'.

What happens when you parse a java program is that the input first is 
fed into a tokenizer, then tokenizer emit tokens to the parser.   Inputs 
like '\unnnn' are transformed to corresponding characters before feeding 
into the parser, in the tokenizer or even before the tokenizer. This is 
how the Sun javac parser does it, and Java.g is designed to act this way 
too. If you are going to use Java.g and handle inputs like '\unnnn' 
you'll have to implement this process yourself.

I guess you are using antlrworks to test the grammar,  but antlrworks 
won't do the transformation for you that's why you get the error.

Then what does this mean and why it's put like this?

  Won't pass input containing unicode sequence like this
       char c = '\uffff'
       String s = "\uffff";


'\uffff' is like any other chars say, 'a', 'b', '"' only there is no 
visual representation. But it can still be present in any java files.
What this section really means is that Antlr can not handle the 
character represented by '\uffff'.

Hope this solves your problem.

y


On 08/28/2010 06:04 PM, Roberto Mannai wrote:
> Yes, it does not work with '\u0096'. So am I supposed to (re)open a bug? Where?
>
> On Sat, Aug 28, 2010 at 9:42 AM, Yang Jiang<yang.jiang.z at gmail.com>  wrote:
>    
>> You can change '\uffff' to some other valid chars like '\u0096' etc..
>> If that works, then looks like the problem gets back in Antlr 3.2.
>>
>>
>> yang
>>
>> On 08/28/2010 03:19 PM, Roberto Mannai wrote:
>>      
>>> Hello
>>>
>>> [I sent the following message to the antlr-interest mailing list,
>>> sorry for the cross posting]
>>>
>>> I'm trying to understand whether the Java grammar from
>>> http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g
>>> processes correctly the Unicode chars or not.
>>>
>>> In the file's header I read:
>>> <<
>>>   *  Know problems:
>>>   *    Won't pass input containing unicode sequence like this
>>>   *      char c = '\uffff'
>>>   *      String s = "\uffff";
>>>   *    Because Antlr does not treat '\uffff' as an valid char. This
>>> will be fixed in the next Antlr
>>>   *    release. [Fixed in Antlr-3.1.1]
>>>
>>>        
>>>>>
>>>>>            
>>> So, it seems that antlr 3.2 should handle the Unicode charset. Anyway,
>>> when I try to parse the following class:
>>>
>>> public class TestUnicode {
>>>          public static void test (String[] args){
>>>                  char c = '\uffff';
>>>          }
>>> }
>>>
>>> I get the following error:
>>>       line 3:27 no viable alternative at character 'u'
>>>       line 3:34 mismatched character '\r' expecting '''
>>>       line 1:7 mismatched input 'class' expecting MONKEYS_AT
>>>       line 2:22 mismatched input 'void' expecting MONKEYS_AT
>>>       line 3:21 mismatched input 'c' expecting DOT
>>>       line 3:23 no viable alternative at input '='
>>>       line 4:8 no viable alternative at input '}'
>>>       line 4:8 no viable alternative at input '}'
>>>
>>> If I replace the unicode character it of course works. Am I missing
>>> anything? Please note that version 1.0.5 didn't have this problem.
>>>
>>> Thanks for your help.
>>>
>>> Roberto
>>>
>>>        
>>
>>      


From robermann at gmail.com  Sat Aug 28 05:12:57 2010
From: robermann at gmail.com (Roberto Mannai)
Date: Sat, 28 Aug 2010 14:12:57 +0200
Subject: Does Java.g [version 1.0.6] handle unicode characters?
In-Reply-To: <4C78EEDF.1030102@gmail.com>
References: <AANLkTimqv8xvp4ue4jzUoVbNn01HsH-y2jQhUgv9o4BA@mail.gmail.com>
	<4C78BDDB.8050302@gmail.com>
	<AANLkTinkTtcU_VmF6A2Roe3Wd1siFjkmWo+8tD-RtWZz@mail.gmail.com>
	<4C78EEDF.1030102@gmail.com>
Message-ID: <AANLkTinxUkrq5ONL8EKXpYeFccO6JZWdH0OHqX_5Ygje@mail.gmail.com>

So are you saying that if I have the following source file:
public class TestUnicode {
        public static void test (String[] args){
                char c = '\u0096';
        }
}
which of course compiles, Java.g is not supposed to parse it? Now it
does not work.

Please note that with the version 1.0.5 it was handled correctly (I'm
doing some regression tests for understand whether migrate to v. 1.0.6
or not).

Thanks for your suggestions,
Roberto

On Sat, Aug 28, 2010 at 1:11 PM, Yang Jiang <yang.jiang.z at gmail.com> wrote:
> I looked at the grammar again and there is a misunderstanding here.
>
> If you read the grammar carefully, the STRINGLITERAL part, ?you'll notice
> the grammar is never supposed to handle input like '\unnnn'.
>
> What happens when you parse a java program is that the input first is fed
> into a tokenizer, then tokenizer emit tokens to the parser. ? Inputs like
> '\unnnn' are transformed to corresponding characters before feeding into the
> parser, in the tokenizer or even before the tokenizer. This is how the Sun
> javac parser does it, and Java.g is designed to act this way too. If you are
> going to use Java.g and handle inputs like '\unnnn' you'll have to implement
> this process yourself.
>
> I guess you are using antlrworks to test the grammar, ?but antlrworks won't
> do the transformation for you that's why you get the error.
>
> Then what does this mean and why it's put like this?
>
> ?Won't pass input containing unicode sequence like this
> ? ? ?char c = '\uffff'
> ? ? ?String s = "\uffff";
>
>
> '\uffff' is like any other chars say, 'a', 'b', '"' only there is no visual
> representation. But it can still be present in any java files.
> What this section really means is that Antlr can not handle the character
> represented by '\uffff'.
>
> Hope this solves your problem.
>
> y
>
>
>
>
> On 08/28/2010 06:04 PM, Roberto Mannai wrote:
>>
>> Yes, it does not work with '\u0096'. So am I supposed to (re)open a bug?
>> Where?
>>
>> On Sat, Aug 28, 2010 at 9:42 AM, Yang Jiang<yang.jiang.z at gmail.com>
>> ?wrote:
>>
>>>
>>> You can change '\uffff' to some other valid chars like '\u0096' etc..
>>> If that works, then looks like the problem gets back in Antlr 3.2.
>>>
>>>
>>> yang
>>>
>>> On 08/28/2010 03:19 PM, Roberto Mannai wrote:
>>>
>>>>
>>>> Hello
>>>>
>>>> [I sent the following message to the antlr-interest mailing list,
>>>> sorry for the cross posting]
>>>>
>>>> I'm trying to understand whether the Java grammar from
>>>> http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g
>>>> processes correctly the Unicode chars or not.
>>>>
>>>> In the file's header I read:
>>>> <<
>>>> ?* ?Know problems:
>>>> ?* ? ?Won't pass input containing unicode sequence like this
>>>> ?* ? ? ?char c = '\uffff'
>>>> ?* ? ? ?String s = "\uffff";
>>>> ?* ? ?Because Antlr does not treat '\uffff' as an valid char. This
>>>> will be fixed in the next Antlr
>>>> ?* ? ?release. [Fixed in Antlr-3.1.1]
>>>>
>>>>
>>>>>>
>>>>>>
>>>>
>>>> So, it seems that antlr 3.2 should handle the Unicode charset. Anyway,
>>>> when I try to parse the following class:
>>>>
>>>> public class TestUnicode {
>>>> ? ? ? ? public static void test (String[] args){
>>>> ? ? ? ? ? ? ? ? char c = '\uffff';
>>>> ? ? ? ? }
>>>> }
>>>>
>>>> I get the following error:
>>>> ? ? ?line 3:27 no viable alternative at character 'u'
>>>> ? ? ?line 3:34 mismatched character '\r' expecting '''
>>>> ? ? ?line 1:7 mismatched input 'class' expecting MONKEYS_AT
>>>> ? ? ?line 2:22 mismatched input 'void' expecting MONKEYS_AT
>>>> ? ? ?line 3:21 mismatched input 'c' expecting DOT
>>>> ? ? ?line 3:23 no viable alternative at input '='
>>>> ? ? ?line 4:8 no viable alternative at input '}'
>>>> ? ? ?line 4:8 no viable alternative at input '}'
>>>>
>>>> If I replace the unicode character it of course works. Am I missing
>>>> anything? Please note that version 1.0.5 didn't have this problem.
>>>>
>>>> Thanks for your help.
>>>>
>>>> Roberto
>>>>
>>>>
>>>
>>>
>
>

From yang.jiang.z at gmail.com  Sat Aug 28 05:46:48 2010
From: yang.jiang.z at gmail.com (Yang Jiang)
Date: Sat, 28 Aug 2010 20:46:48 +0800
Subject: Does Java.g [version 1.0.6] handle unicode characters?
In-Reply-To: <AANLkTinxUkrq5ONL8EKXpYeFccO6JZWdH0OHqX_5Ygje@mail.gmail.com>
References: <AANLkTimqv8xvp4ue4jzUoVbNn01HsH-y2jQhUgv9o4BA@mail.gmail.com>	<4C78BDDB.8050302@gmail.com>	<AANLkTinkTtcU_VmF6A2Roe3Wd1siFjkmWo+8tD-RtWZz@mail.gmail.com>	<4C78EEDF.1030102@gmail.com>
	<AANLkTinxUkrq5ONL8EKXpYeFccO6JZWdH0OHqX_5Ygje@mail.gmail.com>
Message-ID: <4C790538.2060105@gmail.com>

Are you referring to these versions as it appear in the comment?

  *  Version 1.0.5 -- Terence, June 21, 2007
  *  --a[i].foo didn't work. Fixed unaryExpression
  *
  *  Version 1.0.6 -- John Ridgway, March 17, 2008
  *      Made "assert" a switchable keyword like "enum".
  *      Fixed compilationUnit to disallow "annotation importDeclaration 
...".


The Java.g on 
http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g, 
(let's call it openjdk Java.g)
is different from those Version 1.0.5 or 1.0.6 you are referring to.  
Although it's derived from the same source - the one wrote by Terence 
(let's call it Terence Java.g)

If you read the comment in the openjdk Java.g carefully, you'll notice 
this line "Below are comments found in the original version. " . Down 
from there, is the comment from the Terence Java.g. It's kept that way 
to respect the copyright.

The openjdk Java.g is developed at Sun and is VERY well tested. It also 
has some significant changes compared to the Terence Java.g.  And one of 
the changes is what you have noticed, that the grammar doesn't not 
handle unicode character representation, this section is taken out from 
Terence Java.g

UnicodeEscape
     :   '\\' 'u' HexDigit HexDigit HexDigit HexDigit
     ;

Other changes you probably have read from the comment.

That said, although it's from the same source as your version 1.0.5. It 
should not be considers as a successor to 1.0.5 or 1.0.6.  You might 
want to do more research before upgrading.

yang


On 08/28/2010 08:12 PM, Roberto Mannai wrote:
> So are you saying that if I have the following source file:
> public class TestUnicode {
>          public static void test (String[] args){
>                  char c = '\u0096';
>          }
> }
> which of course compiles, Java.g is not supposed to parse it? Now it
> does not work.
>
> Please note that with the version 1.0.5 it was handled correctly (I'm
> doing some regression tests for understand whether migrate to v. 1.0.6
> or not).
>
> Thanks for your suggestions,
> Roberto
>
> On Sat, Aug 28, 2010 at 1:11 PM, Yang Jiang<yang.jiang.z at gmail.com>  wrote:
>    
>> I looked at the grammar again and there is a misunderstanding here.
>>
>> If you read the grammar carefully, the STRINGLITERAL part,  you'll notice
>> the grammar is never supposed to handle input like '\unnnn'.
>>
>> What happens when you parse a java program is that the input first is fed
>> into a tokenizer, then tokenizer emit tokens to the parser.   Inputs like
>> '\unnnn' are transformed to corresponding characters before feeding into the
>> parser, in the tokenizer or even before the tokenizer. This is how the Sun
>> javac parser does it, and Java.g is designed to act this way too. If you are
>> going to use Java.g and handle inputs like '\unnnn' you'll have to implement
>> this process yourself.
>>
>> I guess you are using antlrworks to test the grammar,  but antlrworks won't
>> do the transformation for you that's why you get the error.
>>
>> Then what does this mean and why it's put like this?
>>
>>   Won't pass input containing unicode sequence like this
>>       char c = '\uffff'
>>       String s = "\uffff";
>>
>>
>> '\uffff' is like any other chars say, 'a', 'b', '"' only there is no visual
>> representation. But it can still be present in any java files.
>> What this section really means is that Antlr can not handle the character
>> represented by '\uffff'.
>>
>> Hope this solves your problem.
>>
>> y
>>
>>
>>
>>
>> On 08/28/2010 06:04 PM, Roberto Mannai wrote:
>>      
>>> Yes, it does not work with '\u0096'. So am I supposed to (re)open a bug?
>>> Where?
>>>
>>> On Sat, Aug 28, 2010 at 9:42 AM, Yang Jiang<yang.jiang.z at gmail.com>
>>>   wrote:
>>>
>>>        
>>>> You can change '\uffff' to some other valid chars like '\u0096' etc..
>>>> If that works, then looks like the problem gets back in Antlr 3.2.
>>>>
>>>>
>>>> yang
>>>>
>>>> On 08/28/2010 03:19 PM, Roberto Mannai wrote:
>>>>
>>>>          
>>>>> Hello
>>>>>
>>>>> [I sent the following message to the antlr-interest mailing list,
>>>>> sorry for the cross posting]
>>>>>
>>>>> I'm trying to understand whether the Java grammar from
>>>>> http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g
>>>>> processes correctly the Unicode chars or not.
>>>>>
>>>>> In the file's header I read:
>>>>> <<
>>>>>   *  Know problems:
>>>>>   *    Won't pass input containing unicode sequence like this
>>>>>   *      char c = '\uffff'
>>>>>   *      String s = "\uffff";
>>>>>   *    Because Antlr does not treat '\uffff' as an valid char. This
>>>>> will be fixed in the next Antlr
>>>>>   *    release. [Fixed in Antlr-3.1.1]
>>>>>
>>>>>
>>>>>            
>>>>>>>
>>>>>>>                
>>>>> So, it seems that antlr 3.2 should handle the Unicode charset. Anyway,
>>>>> when I try to parse the following class:
>>>>>
>>>>> public class TestUnicode {
>>>>>          public static void test (String[] args){
>>>>>                  char c = '\uffff';
>>>>>          }
>>>>> }
>>>>>
>>>>> I get the following error:
>>>>>       line 3:27 no viable alternative at character 'u'
>>>>>       line 3:34 mismatched character '\r' expecting '''
>>>>>       line 1:7 mismatched input 'class' expecting MONKEYS_AT
>>>>>       line 2:22 mismatched input 'void' expecting MONKEYS_AT
>>>>>       line 3:21 mismatched input 'c' expecting DOT
>>>>>       line 3:23 no viable alternative at input '='
>>>>>       line 4:8 no viable alternative at input '}'
>>>>>       line 4:8 no viable alternative at input '}'
>>>>>
>>>>> If I replace the unicode character it of course works. Am I missing
>>>>> anything? Please note that version 1.0.5 didn't have this problem.
>>>>>
>>>>> Thanks for your help.
>>>>>
>>>>> Roberto
>>>>>
>>>>>
>>>>>            
>>>>
>>>>          
>>
>>      

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/compiler-grammar-dev/attachments/20100828/840e9732/attachment-0001.html 

From robermann at gmail.com  Sat Aug 28 06:19:10 2010
From: robermann at gmail.com (Roberto Mannai)
Date: Sat, 28 Aug 2010 15:19:10 +0200
Subject: Does Java.g [version 1.0.6] handle unicode characters?
In-Reply-To: <4C790538.2060105@gmail.com>
References: <AANLkTimqv8xvp4ue4jzUoVbNn01HsH-y2jQhUgv9o4BA@mail.gmail.com>
	<4C78BDDB.8050302@gmail.com>
	<AANLkTinkTtcU_VmF6A2Roe3Wd1siFjkmWo+8tD-RtWZz@mail.gmail.com>
	<4C78EEDF.1030102@gmail.com>
	<AANLkTinxUkrq5ONL8EKXpYeFccO6JZWdH0OHqX_5Ygje@mail.gmail.com>
	<4C790538.2060105@gmail.com>
Message-ID: <AANLkTinPF8WACuGfoa0WavgSvU03S4goVkxyMyi5czzj@mail.gmail.com>

Hi Yang
Thanks now I get it. Although I did read those comments, I could not
infer from them the "significant changes compared to the Terence
Java.g": skipping standard comments and spaces, not anymore Unicode
handling; IMHO they would worth an explicit note into the file's
changelog section, otherwise they only can surface with a textual diff
on Terence's version.

Thanks again,
Roberto


On Sat, Aug 28, 2010 at 2:46 PM, Yang Jiang <yang.jiang.z at gmail.com> wrote:
> Are you referring to these versions as it appear in the comment?
>
> ?*? Version 1.0.5 -- Terence, June 21, 2007
> ?*? --a[i].foo didn't work. Fixed unaryExpression
> ?*
> ?*? Version 1.0.6 -- John Ridgway, March 17, 2008
> ?*????? Made "assert" a switchable keyword like "enum".
> ?*????? Fixed compilationUnit to disallow "annotation importDeclaration
> ...".
>
>
> The Java.g on
> http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g, (let's
> call it openjdk Java.g)
> is different from those Version 1.0.5 or 1.0.6 you are referring to.
> Although it's derived from the same source - the one wrote by Terence (let's
> call it Terence Java.g)
>
> If you read the comment in the openjdk Java.g carefully, you'll notice this
> line "Below are comments found in the original version. " . Down from there,
> is the comment from the Terence Java.g. It's kept that way to respect the
> copyright.
>
> The openjdk Java.g is developed at Sun and is VERY well tested. It also has
> some significant changes compared to the Terence Java.g.? And one of the
> changes is what you have noticed, that the grammar doesn't not handle
> unicode character representation, this section is taken out from Terence
> Java.g
>
> UnicodeEscape
>     :   '\\' 'u' HexDigit HexDigit HexDigit HexDigit
>     ;
>
> Other changes you probably have read from the comment.
>
> That said, although it's from the same source as your version 1.0.5. It
> should not be considers as a successor to 1.0.5 or 1.0.6.? You might want to
> do more research before upgrading.
>
> yang
>
>
>
> On 08/28/2010 08:12 PM, Roberto Mannai wrote:
>
> So are you saying that if I have the following source file:
> public class TestUnicode {
>         public static void test (String[] args){
>                 char c = '\u0096';
>         }
> }
> which of course compiles, Java.g is not supposed to parse it? Now it
> does not work.
>
> Please note that with the version 1.0.5 it was handled correctly (I'm
> doing some regression tests for understand whether migrate to v. 1.0.6
> or not).
>
> Thanks for your suggestions,
> Roberto
>
> On Sat, Aug 28, 2010 at 1:11 PM, Yang Jiang <yang.jiang.z at gmail.com> wrote:
>
>
> I looked at the grammar again and there is a misunderstanding here.
>
> If you read the grammar carefully, the STRINGLITERAL part, ?you'll notice
> the grammar is never supposed to handle input like '\unnnn'.
>
> What happens when you parse a java program is that the input first is fed
> into a tokenizer, then tokenizer emit tokens to the parser. ? Inputs like
> '\unnnn' are transformed to corresponding characters before feeding into the
> parser, in the tokenizer or even before the tokenizer. This is how the Sun
> javac parser does it, and Java.g is designed to act this way too. If you are
> going to use Java.g and handle inputs like '\unnnn' you'll have to implement
> this process yourself.
>
> I guess you are using antlrworks to test the grammar, ?but antlrworks won't
> do the transformation for you that's why you get the error.
>
> Then what does this mean and why it's put like this?
>
> ?Won't pass input containing unicode sequence like this
> ? ? ?char c = '\uffff'
> ? ? ?String s = "\uffff";
>
>
> '\uffff' is like any other chars say, 'a', 'b', '"' only there is no visual
> representation. But it can still be present in any java files.
> What this section really means is that Antlr can not handle the character
> represented by '\uffff'.
>
> Hope this solves your problem.
>
> y
>
>
>
>
> On 08/28/2010 06:04 PM, Roberto Mannai wrote:
>
>
> Yes, it does not work with '\u0096'. So am I supposed to (re)open a bug?
> Where?
>
> On Sat, Aug 28, 2010 at 9:42 AM, Yang Jiang<yang.jiang.z at gmail.com>
> ?wrote:
>
>
>
> You can change '\uffff' to some other valid chars like '\u0096' etc..
> If that works, then looks like the problem gets back in Antlr 3.2.
>
>
> yang
>
> On 08/28/2010 03:19 PM, Roberto Mannai wrote:
>
>
>
> Hello
>
> [I sent the following message to the antlr-interest mailing list,
> sorry for the cross posting]
>
> I'm trying to understand whether the Java grammar from
> http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g
> processes correctly the Unicode chars or not.
>
> In the file's header I read:
> <<
> ?* ?Know problems:
> ?* ? ?Won't pass input containing unicode sequence like this
> ?* ? ? ?char c = '\uffff'
> ?* ? ? ?String s = "\uffff";
> ?* ? ?Because Antlr does not treat '\uffff' as an valid char. This
> will be fixed in the next Antlr
> ?* ? ?release. [Fixed in Antlr-3.1.1]
>
>
>
>
>
>
> So, it seems that antlr 3.2 should handle the Unicode charset. Anyway,
> when I try to parse the following class:
>
> public class TestUnicode {
> ? ? ? ? public static void test (String[] args){
> ? ? ? ? ? ? ? ? char c = '\uffff';
> ? ? ? ? }
> }
>
> I get the following error:
> ? ? ?line 3:27 no viable alternative at character 'u'
> ? ? ?line 3:34 mismatched character '\r' expecting '''
> ? ? ?line 1:7 mismatched input 'class' expecting MONKEYS_AT
> ? ? ?line 2:22 mismatched input 'void' expecting MONKEYS_AT
> ? ? ?line 3:21 mismatched input 'c' expecting DOT
> ? ? ?line 3:23 no viable alternative at input '='
> ? ? ?line 4:8 no viable alternative at input '}'
> ? ? ?line 4:8 no viable alternative at input '}'
>
> If I replace the unicode character it of course works. Am I missing
> anything? Please note that version 1.0.5 didn't have this problem.
>
> Thanks for your help.
>
> Roberto
>
>
>
>
>
>
>
>