From robermann at gmail.com Sat Aug 28 00:18:59 2010 From: robermann at gmail.com (Roberto Mannai) Date: Sat, 28 Aug 2010 09:18:59 +0200 Subject: Java.g [version 1.0.6]: are now non-Javadoc comments suppressed? Message-ID: Hi all [I sent the following message to the antlr-interest mailing list, sorry for the cross posting] Maybe I'm overlooking something, but it seems to me that the Java.g v.1.0.6 grammar ( http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g ) does not emit the standard Java comments ( /* COMMENT */ ). When I process the following file, with antlr 3.2: /** Is this comment returned?*/ public class TestComment { /* Is this comment returned?*/ } I get: /** Is this comment returned?*/publicclassTestComment{} One "previous version" (*) returned: /** Is this comment returned?*/ public class TestComment { /* Is this comment returned?*/ } Is this behaviour really changed or am I missing/forgetting anything? Thanks, Roberto (*) The previous version was based on Java.g v.1.0.5 (it contains also some template calls, anyway): http://codesounding.svn.sourceforge.net/viewvc/codesounding/CodeSounding/trunk/src/codesounding/antlr/JavaRewrite.g?view=markup From robermann at gmail.com Sat Aug 28 00:19:20 2010 From: robermann at gmail.com (Roberto Mannai) Date: Sat, 28 Aug 2010 09:19:20 +0200 Subject: Does Java.g [version 1.0.6] handle unicode characters? Message-ID: Hello [I sent the following message to the antlr-interest mailing list, sorry for the cross posting] I'm trying to understand whether the Java grammar from http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g processes correctly the Unicode chars or not. In the file's header I read: << * Know problems: * Won't pass input containing unicode sequence like this * char c = '\uffff' * String s = "\uffff"; * Because Antlr does not treat '\uffff' as an valid char. This will be fixed in the next Antlr * release. [Fixed in Antlr-3.1.1] >> So, it seems that antlr 3.2 should handle the Unicode charset. Anyway, when I try to parse the following class: public class TestUnicode { public static void test (String[] args){ char c = '\uffff'; } } I get the following error: line 3:27 no viable alternative at character 'u' line 3:34 mismatched character '\r' expecting ''' line 1:7 mismatched input 'class' expecting MONKEYS_AT line 2:22 mismatched input 'void' expecting MONKEYS_AT line 3:21 mismatched input 'c' expecting DOT line 3:23 no viable alternative at input '=' line 4:8 no viable alternative at input '}' line 4:8 no viable alternative at input '}' If I replace the unicode character it of course works. Am I missing anything? Please note that version 1.0.5 didn't have this problem. Thanks for your help. Roberto From yang.jiang.z at gmail.com Sat Aug 28 00:38:09 2010 From: yang.jiang.z at gmail.com (Yang Jiang) Date: Sat, 28 Aug 2010 15:38:09 +0800 Subject: Java.g [version 1.0.6]: are now non-Javadoc comments suppressed? In-Reply-To: References: Message-ID: <4C78BCE1.3010209@gmail.com> What you got is right, standard comment, line comment ( "// coment" ) and white spaces are skipped. Just search for "skip" in java.g and you'll see. yang On 08/28/2010 03:18 PM, Roberto Mannai wrote: > Hi all > > [I sent the following message to the antlr-interest mailing list, > sorry for the cross posting] > > Maybe I'm overlooking something, but it seems to me that the Java.g > v.1.0.6 grammar ( > http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g ) > does not emit the standard Java comments ( /* COMMENT */ ). > > When I process the following file, with antlr 3.2: > > /** Is this comment returned?*/ > public class TestComment { > /* Is this comment returned?*/ > } > > I get: > /** Is this comment returned?*/publicclassTestComment{} > > One "previous version" (*) returned: > /** Is this comment returned?*/ > public class TestComment { > /* Is this comment returned?*/ > } > > > Is this behaviour really changed or am I missing/forgetting anything? > > Thanks, > > Roberto > > (*) The previous version was based on Java.g v.1.0.5 (it contains also > some template calls, anyway): > http://codesounding.svn.sourceforge.net/viewvc/codesounding/CodeSounding/trunk/src/codesounding/antlr/JavaRewrite.g?view=markup > From yang.jiang.z at gmail.com Sat Aug 28 00:42:19 2010 From: yang.jiang.z at gmail.com (Yang Jiang) Date: Sat, 28 Aug 2010 15:42:19 +0800 Subject: Does Java.g [version 1.0.6] handle unicode characters? In-Reply-To: References: Message-ID: <4C78BDDB.8050302@gmail.com> You can change '\uffff' to some other valid chars like '\u0096' etc.. If that works, then looks like the problem gets back in Antlr 3.2. yang On 08/28/2010 03:19 PM, Roberto Mannai wrote: > Hello > > [I sent the following message to the antlr-interest mailing list, > sorry for the cross posting] > > I'm trying to understand whether the Java grammar from > http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g > processes correctly the Unicode chars or not. > > In the file's header I read: > << > * Know problems: > * Won't pass input containing unicode sequence like this > * char c = '\uffff' > * String s = "\uffff"; > * Because Antlr does not treat '\uffff' as an valid char. This > will be fixed in the next Antlr > * release. [Fixed in Antlr-3.1.1] > >>> > So, it seems that antlr 3.2 should handle the Unicode charset. Anyway, > when I try to parse the following class: > > public class TestUnicode { > public static void test (String[] args){ > char c = '\uffff'; > } > } > > I get the following error: > line 3:27 no viable alternative at character 'u' > line 3:34 mismatched character '\r' expecting ''' > line 1:7 mismatched input 'class' expecting MONKEYS_AT > line 2:22 mismatched input 'void' expecting MONKEYS_AT > line 3:21 mismatched input 'c' expecting DOT > line 3:23 no viable alternative at input '=' > line 4:8 no viable alternative at input '}' > line 4:8 no viable alternative at input '}' > > If I replace the unicode character it of course works. Am I missing > anything? Please note that version 1.0.5 didn't have this problem. > > Thanks for your help. > > Roberto > From robermann at gmail.com Sat Aug 28 03:04:32 2010 From: robermann at gmail.com (Roberto Mannai) Date: Sat, 28 Aug 2010 12:04:32 +0200 Subject: Does Java.g [version 1.0.6] handle unicode characters? In-Reply-To: <4C78BDDB.8050302@gmail.com> References: <4C78BDDB.8050302@gmail.com> Message-ID: Yes, it does not work with '\u0096'. So am I supposed to (re)open a bug? Where? On Sat, Aug 28, 2010 at 9:42 AM, Yang Jiang wrote: > You can change '\uffff' to some other valid chars like '\u0096' etc.. > If that works, then looks like the problem gets back in Antlr 3.2. > > > yang > > On 08/28/2010 03:19 PM, Roberto Mannai wrote: >> >> Hello >> >> [I sent the following message to the antlr-interest mailing list, >> sorry for the cross posting] >> >> I'm trying to understand whether the Java grammar from >> http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g >> processes correctly the Unicode chars or not. >> >> In the file's header I read: >> << >> ?* ?Know problems: >> ?* ? ?Won't pass input containing unicode sequence like this >> ?* ? ? ?char c = '\uffff' >> ?* ? ? ?String s = "\uffff"; >> ?* ? ?Because Antlr does not treat '\uffff' as an valid char. This >> will be fixed in the next Antlr >> ?* ? ?release. [Fixed in Antlr-3.1.1] >> >>>> >>>> >> >> So, it seems that antlr 3.2 should handle the Unicode charset. Anyway, >> when I try to parse the following class: >> >> public class TestUnicode { >> ? ? ? ? public static void test (String[] args){ >> ? ? ? ? ? ? ? ? char c = '\uffff'; >> ? ? ? ? } >> } >> >> I get the following error: >> ? ? ?line 3:27 no viable alternative at character 'u' >> ? ? ?line 3:34 mismatched character '\r' expecting ''' >> ? ? ?line 1:7 mismatched input 'class' expecting MONKEYS_AT >> ? ? ?line 2:22 mismatched input 'void' expecting MONKEYS_AT >> ? ? ?line 3:21 mismatched input 'c' expecting DOT >> ? ? ?line 3:23 no viable alternative at input '=' >> ? ? ?line 4:8 no viable alternative at input '}' >> ? ? ?line 4:8 no viable alternative at input '}' >> >> If I replace the unicode character it of course works. Am I missing >> anything? Please note that version 1.0.5 didn't have this problem. >> >> Thanks for your help. >> >> Roberto >> > > From yang.jiang.z at gmail.com Sat Aug 28 04:11:27 2010 From: yang.jiang.z at gmail.com (Yang Jiang) Date: Sat, 28 Aug 2010 19:11:27 +0800 Subject: Does Java.g [version 1.0.6] handle unicode characters? In-Reply-To: References: <4C78BDDB.8050302@gmail.com> Message-ID: <4C78EEDF.1030102@gmail.com> I looked at the grammar again and there is a misunderstanding here. If you read the grammar carefully, the STRINGLITERAL part, you'll notice the grammar is never supposed to handle input like '\unnnn'. What happens when you parse a java program is that the input first is fed into a tokenizer, then tokenizer emit tokens to the parser. Inputs like '\unnnn' are transformed to corresponding characters before feeding into the parser, in the tokenizer or even before the tokenizer. This is how the Sun javac parser does it, and Java.g is designed to act this way too. If you are going to use Java.g and handle inputs like '\unnnn' you'll have to implement this process yourself. I guess you are using antlrworks to test the grammar, but antlrworks won't do the transformation for you that's why you get the error. Then what does this mean and why it's put like this? Won't pass input containing unicode sequence like this char c = '\uffff' String s = "\uffff"; '\uffff' is like any other chars say, 'a', 'b', '"' only there is no visual representation. But it can still be present in any java files. What this section really means is that Antlr can not handle the character represented by '\uffff'. Hope this solves your problem. y On 08/28/2010 06:04 PM, Roberto Mannai wrote: > Yes, it does not work with '\u0096'. So am I supposed to (re)open a bug? Where? > > On Sat, Aug 28, 2010 at 9:42 AM, Yang Jiang wrote: > >> You can change '\uffff' to some other valid chars like '\u0096' etc.. >> If that works, then looks like the problem gets back in Antlr 3.2. >> >> >> yang >> >> On 08/28/2010 03:19 PM, Roberto Mannai wrote: >> >>> Hello >>> >>> [I sent the following message to the antlr-interest mailing list, >>> sorry for the cross posting] >>> >>> I'm trying to understand whether the Java grammar from >>> http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g >>> processes correctly the Unicode chars or not. >>> >>> In the file's header I read: >>> << >>> * Know problems: >>> * Won't pass input containing unicode sequence like this >>> * char c = '\uffff' >>> * String s = "\uffff"; >>> * Because Antlr does not treat '\uffff' as an valid char. This >>> will be fixed in the next Antlr >>> * release. [Fixed in Antlr-3.1.1] >>> >>> >>>>> >>>>> >>> So, it seems that antlr 3.2 should handle the Unicode charset. Anyway, >>> when I try to parse the following class: >>> >>> public class TestUnicode { >>> public static void test (String[] args){ >>> char c = '\uffff'; >>> } >>> } >>> >>> I get the following error: >>> line 3:27 no viable alternative at character 'u' >>> line 3:34 mismatched character '\r' expecting ''' >>> line 1:7 mismatched input 'class' expecting MONKEYS_AT >>> line 2:22 mismatched input 'void' expecting MONKEYS_AT >>> line 3:21 mismatched input 'c' expecting DOT >>> line 3:23 no viable alternative at input '=' >>> line 4:8 no viable alternative at input '}' >>> line 4:8 no viable alternative at input '}' >>> >>> If I replace the unicode character it of course works. Am I missing >>> anything? Please note that version 1.0.5 didn't have this problem. >>> >>> Thanks for your help. >>> >>> Roberto >>> >>> >> >> From robermann at gmail.com Sat Aug 28 05:12:57 2010 From: robermann at gmail.com (Roberto Mannai) Date: Sat, 28 Aug 2010 14:12:57 +0200 Subject: Does Java.g [version 1.0.6] handle unicode characters? In-Reply-To: <4C78EEDF.1030102@gmail.com> References: <4C78BDDB.8050302@gmail.com> <4C78EEDF.1030102@gmail.com> Message-ID: So are you saying that if I have the following source file: public class TestUnicode { public static void test (String[] args){ char c = '\u0096'; } } which of course compiles, Java.g is not supposed to parse it? Now it does not work. Please note that with the version 1.0.5 it was handled correctly (I'm doing some regression tests for understand whether migrate to v. 1.0.6 or not). Thanks for your suggestions, Roberto On Sat, Aug 28, 2010 at 1:11 PM, Yang Jiang wrote: > I looked at the grammar again and there is a misunderstanding here. > > If you read the grammar carefully, the STRINGLITERAL part, ?you'll notice > the grammar is never supposed to handle input like '\unnnn'. > > What happens when you parse a java program is that the input first is fed > into a tokenizer, then tokenizer emit tokens to the parser. ? Inputs like > '\unnnn' are transformed to corresponding characters before feeding into the > parser, in the tokenizer or even before the tokenizer. This is how the Sun > javac parser does it, and Java.g is designed to act this way too. If you are > going to use Java.g and handle inputs like '\unnnn' you'll have to implement > this process yourself. > > I guess you are using antlrworks to test the grammar, ?but antlrworks won't > do the transformation for you that's why you get the error. > > Then what does this mean and why it's put like this? > > ?Won't pass input containing unicode sequence like this > ? ? ?char c = '\uffff' > ? ? ?String s = "\uffff"; > > > '\uffff' is like any other chars say, 'a', 'b', '"' only there is no visual > representation. But it can still be present in any java files. > What this section really means is that Antlr can not handle the character > represented by '\uffff'. > > Hope this solves your problem. > > y > > > > > On 08/28/2010 06:04 PM, Roberto Mannai wrote: >> >> Yes, it does not work with '\u0096'. So am I supposed to (re)open a bug? >> Where? >> >> On Sat, Aug 28, 2010 at 9:42 AM, Yang Jiang >> ?wrote: >> >>> >>> You can change '\uffff' to some other valid chars like '\u0096' etc.. >>> If that works, then looks like the problem gets back in Antlr 3.2. >>> >>> >>> yang >>> >>> On 08/28/2010 03:19 PM, Roberto Mannai wrote: >>> >>>> >>>> Hello >>>> >>>> [I sent the following message to the antlr-interest mailing list, >>>> sorry for the cross posting] >>>> >>>> I'm trying to understand whether the Java grammar from >>>> http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g >>>> processes correctly the Unicode chars or not. >>>> >>>> In the file's header I read: >>>> << >>>> ?* ?Know problems: >>>> ?* ? ?Won't pass input containing unicode sequence like this >>>> ?* ? ? ?char c = '\uffff' >>>> ?* ? ? ?String s = "\uffff"; >>>> ?* ? ?Because Antlr does not treat '\uffff' as an valid char. This >>>> will be fixed in the next Antlr >>>> ?* ? ?release. [Fixed in Antlr-3.1.1] >>>> >>>> >>>>>> >>>>>> >>>> >>>> So, it seems that antlr 3.2 should handle the Unicode charset. Anyway, >>>> when I try to parse the following class: >>>> >>>> public class TestUnicode { >>>> ? ? ? ? public static void test (String[] args){ >>>> ? ? ? ? ? ? ? ? char c = '\uffff'; >>>> ? ? ? ? } >>>> } >>>> >>>> I get the following error: >>>> ? ? ?line 3:27 no viable alternative at character 'u' >>>> ? ? ?line 3:34 mismatched character '\r' expecting ''' >>>> ? ? ?line 1:7 mismatched input 'class' expecting MONKEYS_AT >>>> ? ? ?line 2:22 mismatched input 'void' expecting MONKEYS_AT >>>> ? ? ?line 3:21 mismatched input 'c' expecting DOT >>>> ? ? ?line 3:23 no viable alternative at input '=' >>>> ? ? ?line 4:8 no viable alternative at input '}' >>>> ? ? ?line 4:8 no viable alternative at input '}' >>>> >>>> If I replace the unicode character it of course works. Am I missing >>>> anything? Please note that version 1.0.5 didn't have this problem. >>>> >>>> Thanks for your help. >>>> >>>> Roberto >>>> >>>> >>> >>> > > From yang.jiang.z at gmail.com Sat Aug 28 05:46:48 2010 From: yang.jiang.z at gmail.com (Yang Jiang) Date: Sat, 28 Aug 2010 20:46:48 +0800 Subject: Does Java.g [version 1.0.6] handle unicode characters? In-Reply-To: References: <4C78BDDB.8050302@gmail.com> <4C78EEDF.1030102@gmail.com> Message-ID: <4C790538.2060105@gmail.com> Are you referring to these versions as it appear in the comment? * Version 1.0.5 -- Terence, June 21, 2007 * --a[i].foo didn't work. Fixed unaryExpression * * Version 1.0.6 -- John Ridgway, March 17, 2008 * Made "assert" a switchable keyword like "enum". * Fixed compilationUnit to disallow "annotation importDeclaration ...". The Java.g on http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g, (let's call it openjdk Java.g) is different from those Version 1.0.5 or 1.0.6 you are referring to. Although it's derived from the same source - the one wrote by Terence (let's call it Terence Java.g) If you read the comment in the openjdk Java.g carefully, you'll notice this line "Below are comments found in the original version. " . Down from there, is the comment from the Terence Java.g. It's kept that way to respect the copyright. The openjdk Java.g is developed at Sun and is VERY well tested. It also has some significant changes compared to the Terence Java.g. And one of the changes is what you have noticed, that the grammar doesn't not handle unicode character representation, this section is taken out from Terence Java.g UnicodeEscape : '\\' 'u' HexDigit HexDigit HexDigit HexDigit ; Other changes you probably have read from the comment. That said, although it's from the same source as your version 1.0.5. It should not be considers as a successor to 1.0.5 or 1.0.6. You might want to do more research before upgrading. yang On 08/28/2010 08:12 PM, Roberto Mannai wrote: > So are you saying that if I have the following source file: > public class TestUnicode { > public static void test (String[] args){ > char c = '\u0096'; > } > } > which of course compiles, Java.g is not supposed to parse it? Now it > does not work. > > Please note that with the version 1.0.5 it was handled correctly (I'm > doing some regression tests for understand whether migrate to v. 1.0.6 > or not). > > Thanks for your suggestions, > Roberto > > On Sat, Aug 28, 2010 at 1:11 PM, Yang Jiang wrote: > >> I looked at the grammar again and there is a misunderstanding here. >> >> If you read the grammar carefully, the STRINGLITERAL part, you'll notice >> the grammar is never supposed to handle input like '\unnnn'. >> >> What happens when you parse a java program is that the input first is fed >> into a tokenizer, then tokenizer emit tokens to the parser. Inputs like >> '\unnnn' are transformed to corresponding characters before feeding into the >> parser, in the tokenizer or even before the tokenizer. This is how the Sun >> javac parser does it, and Java.g is designed to act this way too. If you are >> going to use Java.g and handle inputs like '\unnnn' you'll have to implement >> this process yourself. >> >> I guess you are using antlrworks to test the grammar, but antlrworks won't >> do the transformation for you that's why you get the error. >> >> Then what does this mean and why it's put like this? >> >> Won't pass input containing unicode sequence like this >> char c = '\uffff' >> String s = "\uffff"; >> >> >> '\uffff' is like any other chars say, 'a', 'b', '"' only there is no visual >> representation. But it can still be present in any java files. >> What this section really means is that Antlr can not handle the character >> represented by '\uffff'. >> >> Hope this solves your problem. >> >> y >> >> >> >> >> On 08/28/2010 06:04 PM, Roberto Mannai wrote: >> >>> Yes, it does not work with '\u0096'. So am I supposed to (re)open a bug? >>> Where? >>> >>> On Sat, Aug 28, 2010 at 9:42 AM, Yang Jiang >>> wrote: >>> >>> >>>> You can change '\uffff' to some other valid chars like '\u0096' etc.. >>>> If that works, then looks like the problem gets back in Antlr 3.2. >>>> >>>> >>>> yang >>>> >>>> On 08/28/2010 03:19 PM, Roberto Mannai wrote: >>>> >>>> >>>>> Hello >>>>> >>>>> [I sent the following message to the antlr-interest mailing list, >>>>> sorry for the cross posting] >>>>> >>>>> I'm trying to understand whether the Java grammar from >>>>> http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g >>>>> processes correctly the Unicode chars or not. >>>>> >>>>> In the file's header I read: >>>>> << >>>>> * Know problems: >>>>> * Won't pass input containing unicode sequence like this >>>>> * char c = '\uffff' >>>>> * String s = "\uffff"; >>>>> * Because Antlr does not treat '\uffff' as an valid char. This >>>>> will be fixed in the next Antlr >>>>> * release. [Fixed in Antlr-3.1.1] >>>>> >>>>> >>>>> >>>>>>> >>>>>>> >>>>> So, it seems that antlr 3.2 should handle the Unicode charset. Anyway, >>>>> when I try to parse the following class: >>>>> >>>>> public class TestUnicode { >>>>> public static void test (String[] args){ >>>>> char c = '\uffff'; >>>>> } >>>>> } >>>>> >>>>> I get the following error: >>>>> line 3:27 no viable alternative at character 'u' >>>>> line 3:34 mismatched character '\r' expecting ''' >>>>> line 1:7 mismatched input 'class' expecting MONKEYS_AT >>>>> line 2:22 mismatched input 'void' expecting MONKEYS_AT >>>>> line 3:21 mismatched input 'c' expecting DOT >>>>> line 3:23 no viable alternative at input '=' >>>>> line 4:8 no viable alternative at input '}' >>>>> line 4:8 no viable alternative at input '}' >>>>> >>>>> If I replace the unicode character it of course works. Am I missing >>>>> anything? Please note that version 1.0.5 didn't have this problem. >>>>> >>>>> Thanks for your help. >>>>> >>>>> Roberto >>>>> >>>>> >>>>> >>>> >>>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/compiler-grammar-dev/attachments/20100828/840e9732/attachment-0001.html From robermann at gmail.com Sat Aug 28 06:19:10 2010 From: robermann at gmail.com (Roberto Mannai) Date: Sat, 28 Aug 2010 15:19:10 +0200 Subject: Does Java.g [version 1.0.6] handle unicode characters? In-Reply-To: <4C790538.2060105@gmail.com> References: <4C78BDDB.8050302@gmail.com> <4C78EEDF.1030102@gmail.com> <4C790538.2060105@gmail.com> Message-ID: Hi Yang Thanks now I get it. Although I did read those comments, I could not infer from them the "significant changes compared to the Terence Java.g": skipping standard comments and spaces, not anymore Unicode handling; IMHO they would worth an explicit note into the file's changelog section, otherwise they only can surface with a textual diff on Terence's version. Thanks again, Roberto On Sat, Aug 28, 2010 at 2:46 PM, Yang Jiang wrote: > Are you referring to these versions as it appear in the comment? > > ?*? Version 1.0.5 -- Terence, June 21, 2007 > ?*? --a[i].foo didn't work. Fixed unaryExpression > ?* > ?*? Version 1.0.6 -- John Ridgway, March 17, 2008 > ?*????? Made "assert" a switchable keyword like "enum". > ?*????? Fixed compilationUnit to disallow "annotation importDeclaration > ...". > > > The Java.g on > http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g, (let's > call it openjdk Java.g) > is different from those Version 1.0.5 or 1.0.6 you are referring to. > Although it's derived from the same source - the one wrote by Terence (let's > call it Terence Java.g) > > If you read the comment in the openjdk Java.g carefully, you'll notice this > line "Below are comments found in the original version. " . Down from there, > is the comment from the Terence Java.g. It's kept that way to respect the > copyright. > > The openjdk Java.g is developed at Sun and is VERY well tested. It also has > some significant changes compared to the Terence Java.g.? And one of the > changes is what you have noticed, that the grammar doesn't not handle > unicode character representation, this section is taken out from Terence > Java.g > > UnicodeEscape > : '\\' 'u' HexDigit HexDigit HexDigit HexDigit > ; > > Other changes you probably have read from the comment. > > That said, although it's from the same source as your version 1.0.5. It > should not be considers as a successor to 1.0.5 or 1.0.6.? You might want to > do more research before upgrading. > > yang > > > > On 08/28/2010 08:12 PM, Roberto Mannai wrote: > > So are you saying that if I have the following source file: > public class TestUnicode { > public static void test (String[] args){ > char c = '\u0096'; > } > } > which of course compiles, Java.g is not supposed to parse it? Now it > does not work. > > Please note that with the version 1.0.5 it was handled correctly (I'm > doing some regression tests for understand whether migrate to v. 1.0.6 > or not). > > Thanks for your suggestions, > Roberto > > On Sat, Aug 28, 2010 at 1:11 PM, Yang Jiang wrote: > > > I looked at the grammar again and there is a misunderstanding here. > > If you read the grammar carefully, the STRINGLITERAL part, ?you'll notice > the grammar is never supposed to handle input like '\unnnn'. > > What happens when you parse a java program is that the input first is fed > into a tokenizer, then tokenizer emit tokens to the parser. ? Inputs like > '\unnnn' are transformed to corresponding characters before feeding into the > parser, in the tokenizer or even before the tokenizer. This is how the Sun > javac parser does it, and Java.g is designed to act this way too. If you are > going to use Java.g and handle inputs like '\unnnn' you'll have to implement > this process yourself. > > I guess you are using antlrworks to test the grammar, ?but antlrworks won't > do the transformation for you that's why you get the error. > > Then what does this mean and why it's put like this? > > ?Won't pass input containing unicode sequence like this > ? ? ?char c = '\uffff' > ? ? ?String s = "\uffff"; > > > '\uffff' is like any other chars say, 'a', 'b', '"' only there is no visual > representation. But it can still be present in any java files. > What this section really means is that Antlr can not handle the character > represented by '\uffff'. > > Hope this solves your problem. > > y > > > > > On 08/28/2010 06:04 PM, Roberto Mannai wrote: > > > Yes, it does not work with '\u0096'. So am I supposed to (re)open a bug? > Where? > > On Sat, Aug 28, 2010 at 9:42 AM, Yang Jiang > ?wrote: > > > > You can change '\uffff' to some other valid chars like '\u0096' etc.. > If that works, then looks like the problem gets back in Antlr 3.2. > > > yang > > On 08/28/2010 03:19 PM, Roberto Mannai wrote: > > > > Hello > > [I sent the following message to the antlr-interest mailing list, > sorry for the cross posting] > > I'm trying to understand whether the Java grammar from > http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g > processes correctly the Unicode chars or not. > > In the file's header I read: > << > ?* ?Know problems: > ?* ? ?Won't pass input containing unicode sequence like this > ?* ? ? ?char c = '\uffff' > ?* ? ? ?String s = "\uffff"; > ?* ? ?Because Antlr does not treat '\uffff' as an valid char. This > will be fixed in the next Antlr > ?* ? ?release. [Fixed in Antlr-3.1.1] > > > > > > > So, it seems that antlr 3.2 should handle the Unicode charset. Anyway, > when I try to parse the following class: > > public class TestUnicode { > ? ? ? ? public static void test (String[] args){ > ? ? ? ? ? ? ? ? char c = '\uffff'; > ? ? ? ? } > } > > I get the following error: > ? ? ?line 3:27 no viable alternative at character 'u' > ? ? ?line 3:34 mismatched character '\r' expecting ''' > ? ? ?line 1:7 mismatched input 'class' expecting MONKEYS_AT > ? ? ?line 2:22 mismatched input 'void' expecting MONKEYS_AT > ? ? ?line 3:21 mismatched input 'c' expecting DOT > ? ? ?line 3:23 no viable alternative at input '=' > ? ? ?line 4:8 no viable alternative at input '}' > ? ? ?line 4:8 no viable alternative at input '}' > > If I replace the unicode character it of course works. Am I missing > anything? Please note that version 1.0.5 didn't have this problem. > > Thanks for your help. > > Roberto > > > > > > > >