From vincent.ryan at sun.com Mon Mar 1 18:00:23 2010 From: vincent.ryan at sun.com (vincent.ryan at sun.com) Date: Mon, 01 Mar 2010 18:00:23 +0000 Subject: hg: jdk7/tl/jdk: 2 new changesets Message-ID: <20100301180103.D0ACF41DBD@hg.openjdk.java.net> Changeset: 78d91c4223cb Author: vinnie Date: 2010-03-01 17:54 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/78d91c4223cb 6921001: api/java_security/IdentityScope/IdentityScopeTests.html#getSystemScope fails starting from b78 JDK7 Reviewed-by: mullan ! src/share/classes/java/security/IdentityScope.java ! src/share/lib/security/java.security + test/java/security/IdentityScope/NoDefaultSystemScope.java Changeset: 893034df4ec2 Author: vinnie Date: 2010-03-01 18:00 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/893034df4ec2 Merge - test/java/nio/file/WatchService/OverflowEventIsLoner.java From joe.darcy at sun.com Tue Mar 2 21:57:06 2010 From: joe.darcy at sun.com (joe.darcy at sun.com) Date: Tue, 02 Mar 2010 21:57:06 +0000 Subject: hg: jdk7/tl/langtools: 6931130: Remove unused AnnotationCollector code from JavacProcessingEnvironment Message-ID: <20100302215709.B65C341F86@hg.openjdk.java.net> Changeset: 7c23bbbe0dbd Author: darcy Date: 2010-03-02 14:06 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/7c23bbbe0dbd 6931130: Remove unused AnnotationCollector code from JavacProcessingEnvironment Reviewed-by: jjg ! src/share/classes/com/sun/tools/javac/processing/JavacProcessingEnvironment.java From Ulf.Zibis at gmx.de Tue Mar 2 23:34:08 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 03 Mar 2010 00:34:08 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4A9578C4.8060801@sun.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> Message-ID: <4B8DA070.3040306@gmx.de> Am 26.08.2009 20:02, schrieb Xueming Shen: > > For example, the isBMP(int), it might be convenient, but it can be > easily archived by the one line code > > (int)(char)codePoint == codePoint; > > or more readable form > > codePoint < Character.MIN_SUPPLEMENTARY_COE_POINT; > In class sun.nio.cs.Surrogate we have: public static boolean isBMP(int uc) { return (int) (char) uc == uc; } 1.) It's enough to have: return (char)uc == uc; better: assert MIN_VALUE == 0 && MAX_VALUE == 0xFFFF; return (char)uc == uc; // Optimized form of: uc >= MIN_VALUE && uc <= MAX_VALUE 2.) Above code is compiled to (needs 16 bytes of machine code): 0x00b87ad8: mov %ebx,%ebp 0x00b87ada: and $0xffff,%ebp 0x00b87ae0: cmp %ebx,%ebp 0x00b87ae2: jne 0x00b87c52 0x00b87ae8: We could code: assert MIN_VALUE == 0 && (MAX_VALUE + 1) == (1 << 16); return (uc >> 16) == 0; // Optimized form of: uc >= MIN_VALUE && uc <= MAX_VALUE is compiled to (needs only 9 bytes of machine code): 0x00b87aac: mov %ebx,%ecx 0x00b87aae: sar $0x10,%ecx 0x00b87ab1: test %ecx,%ecx 0x00b87ab3: je 0x00b87acb 0x00b87ab5: 1.) If we have: public static boolean isSupplementaryCodePoint(int codePoint) { assert MIN_SUPPLEMENTARY_CODE_POINT == (1 << 16) && (MAX_SUPPLEMENTARY_CODE_POINT + 1) % (1 << 16) == 0; return (codePoint >> 16) != 0 && (codePoint >> 16) < (MAX_SUPPLEMENTARY_CODE_POINT + 1 >> 16); // Optimized form of: codePoint >= MIN_SUPPLEMENTARY_CODE_POINT // && codePoint <= MAX_SUPPLEMENTARY_CODE_POINT; } and: if (Surrogate.isBMP(uc)) ...; else if (Character.isSupplementaryCodePoint(uc)) ...; else ...; we get (needs only 18 bytes of machine code): 0x00b87aac: mov %ebx,%ecx 0x00b87aae: sar $0x10,%ecx 0x00b87ab1: test %ecx,%ecx 0x00b87ab3: je 0x00b87acb 0x00b87ab5: cmp $0x11,%ecx 0x00b87ab8: jge 0x00b87ce6 0x00b87abe: -Ulf From jonathan.gibbons at sun.com Wed Mar 3 00:41:43 2010 From: jonathan.gibbons at sun.com (jonathan.gibbons at sun.com) Date: Wed, 03 Mar 2010 00:41:43 +0000 Subject: hg: jdk7/tl/langtools: 6931482: minor findbugs fixes Message-ID: <20100303004146.B692D41FB5@hg.openjdk.java.net> Changeset: 6e1e2738c530 Author: jjg Date: 2010-03-02 16:40 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/6e1e2738c530 6931482: minor findbugs fixes Reviewed-by: darcy ! src/share/classes/com/sun/tools/classfile/ConstantPool.java ! src/share/classes/com/sun/tools/javadoc/DocEnv.java ! src/share/classes/com/sun/tools/javadoc/SeeTagImpl.java From jonathan.gibbons at sun.com Wed Mar 3 00:44:42 2010 From: jonathan.gibbons at sun.com (jonathan.gibbons at sun.com) Date: Wed, 03 Mar 2010 00:44:42 +0000 Subject: hg: jdk7/tl/langtools: 6931127: strange test class files Message-ID: <20100303004445.7B6E041FB6@hg.openjdk.java.net> Changeset: 235135d61974 Author: jjg Date: 2010-03-02 16:43 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/235135d61974 6931127: strange test class files Reviewed-by: darcy ! test/tools/javac/annotations/neg/Constant.java ! test/tools/javac/generics/Casting.java ! test/tools/javac/generics/Casting3.java ! test/tools/javac/generics/Casting4.java ! test/tools/javac/generics/InnerInterface1.java ! test/tools/javac/generics/InnerInterface2.java ! test/tools/javac/generics/Multibound1.java ! test/tools/javac/generics/MultipleInheritance.java ! test/tools/javac/generics/NameOrder.java ! test/tools/javac/generics/PermuteBound.java ! test/tools/javac/generics/PrimitiveVariant.java From martinrb at google.com Wed Mar 3 08:00:05 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 3 Mar 2010 00:00:05 -0800 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4B8DA070.3040306@gmx.de> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> Message-ID: <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> On Tue, Mar 2, 2010 at 15:34, Ulf Zibis wrote: > Am 26.08.2009 20:02, schrieb Xueming Shen: >> >> For example, the isBMP(int), it might be convenient, but it can be easily >> archived by the one line code >> >> (int)(char)codePoint == codePoint; >> >> or more readable form >> >> ? codePoint < Character.MIN_SUPPLEMENTARY_COE_POINT; >> > > In class sun.nio.cs.Surrogate we have: > ? ?public static boolean isBMP(int uc) { > ? ? ? ?return (int) (char) uc == uc; > ? ?} > > 1.) It's enough to have: > ? ? ? ?return (char)uc == uc; > ? ?better: > ? ? ? ?assert MIN_VALUE == 0 && MAX_VALUE == 0xFFFF; > ? ? ? ?return (char)uc == uc; > ? ? ? ?// Optimized form of: uc >= MIN_VALUE && uc <= MAX_VALUE > > 2.) Above code is compiled to (needs 16 bytes of machine code): > ?0x00b87ad8: mov ? ?%ebx,%ebp > ?0x00b87ada: and ? ?$0xffff,%ebp > ?0x00b87ae0: cmp ? ?%ebx,%ebp > ?0x00b87ae2: jne ? ?0x00b87c52 > ?0x00b87ae8: > > ? ?We could code: > ? ? ? ?assert MIN_VALUE == 0 && (MAX_VALUE + 1) == (1 << 16); > ? ? ? ?return (uc >> 16) == 0; > ? ? ? ?// Optimized form of: uc >= MIN_VALUE && uc <= MAX_VALUE I agree that return (uc >> 16) == 0; is marginally better than my return (int) (char) uc == uc; (although I think the redundant cast to int makes the code more readable). I approve such a change to isBMPCodePoint() and inclusion of such a method in Character. > ? ?is compiled to (needs only 9 bytes of machine code): > ?0x00b87aac: mov ? ?%ebx,%ecx > ?0x00b87aae: sar ? ?$0x10,%ecx > ?0x00b87ab1: test ? %ecx,%ecx > ?0x00b87ab3: je ? ? 0x00b87acb > ?0x00b87ab5: > > 1.) If we have: > ? ?public static boolean isSupplementaryCodePoint(int codePoint) { > ? ? ? ?assert MIN_SUPPLEMENTARY_CODE_POINT == (1 << 16) && > ? ? ? ? ? ? ? ?(MAX_SUPPLEMENTARY_CODE_POINT + 1) % (1 << 16) == 0; > ? ? ? ?return (codePoint >> 16) != 0 > && (codePoint >> 16) < (MAX_SUPPLEMENTARY_CODE_POINT + 1 >> 16); > ? ? ? ?// Optimized form of: codePoint >= MIN_SUPPLEMENTARY_CODE_POINT > ? ? ? ?// && codePoint <= MAX_SUPPLEMENTARY_CODE_POINT; > ? ?} Keep in mind that supplementary characters are extremely rare. Therefore the existing implementation return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT && codePoint <= MAX_CODE_POINT; will almost always perform just one comparison against a constant, which is hard to beat. I'm not sure whether your code above gets the right answer for negative input. Perhaps you need to do (codePoint >>> 16) < ... Martin > and: > ? ? ? ?if (Surrogate.isBMP(uc)) > ? ? ? ? ? ?...; > ? ? ? ?else if (Character.isSupplementaryCodePoint(uc)) > ? ? ? ? ? ?...; > ? ? ? ?else > ? ? ? ? ? ?...; > > ? ?we get (needs only 18 bytes of machine code): > ?0x00b87aac: mov ? ?%ebx,%ecx > ?0x00b87aae: sar ? ?$0x10,%ecx > ?0x00b87ab1: test ? %ecx,%ecx > ?0x00b87ab3: je ? ? 0x00b87acb > ?0x00b87ab5: cmp ? ?$0x11,%ecx > ?0x00b87ab8: jge ? ?0x00b87ce6 > ?0x00b87abe: > > > -Ulf > > > > > From Ulf.Zibis at gmx.de Wed Mar 3 10:44:51 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 03 Mar 2010 11:44:51 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> Message-ID: <4B8E3DA3.7090902@gmx.de> Am 03.03.2010 09:00, schrieb Martin Buchholz: > On Tue, Mar 2, 2010 at 15:34, Ulf Zibis wrote: > >> Am 26.08.2009 20:02, schrieb Xueming Shen: >> >>> For example, the isBMP(int), it might be convenient, but it can be easily >>> archived by the one line code >>> >>> (int)(char)codePoint == codePoint; >>> >>> or more readable form >>> >>> codePoint< Character.MIN_SUPPLEMENTARY_COE_POINT; >>> >>> >> In class sun.nio.cs.Surrogate we have: >> public static boolean isBMP(int uc) { >> return (int) (char) uc == uc; >> } >> >> 1.) It's enough to have: >> return (char)uc == uc; >> better: >> assert MIN_VALUE == 0&& MAX_VALUE == 0xFFFF; >> return (char)uc == uc; >> // Optimized form of: uc>= MIN_VALUE&& uc<= MAX_VALUE >> >> 2.) Above code is compiled to (needs 16 bytes of machine code): >> 0x00b87ad8: mov %ebx,%ebp >> 0x00b87ada: and $0xffff,%ebp >> 0x00b87ae0: cmp %ebx,%ebp >> 0x00b87ae2: jne 0x00b87c52 >> 0x00b87ae8: >> >> We could code: >> assert MIN_VALUE == 0&& (MAX_VALUE + 1) == (1<< 16); >> return (uc>> 16) == 0; >> // Optimized form of: uc>= MIN_VALUE&& uc<= MAX_VALUE >> > I agree that > return (uc>> 16) == 0; > is marginally better than my > return (int) (char) uc == uc; > (although I think the redundant cast to int > makes the code more readable). > Seems to be individual. I always stumble over superfluous casts by thinking about, what they have to do. > I approve such a change to isBMPCodePoint() > and inclusion of such a method in Character. > Pleased! Who could file the bug? I would provide the patch. > > >> is compiled to (needs only 9 bytes of machine code): >> 0x00b87aac: mov %ebx,%ecx >> 0x00b87aae: sar $0x10,%ecx >> 0x00b87ab1: test %ecx,%ecx >> 0x00b87ab3: je 0x00b87acb >> 0x00b87ab5: >> >> 1.) If we have: >> public static boolean isSupplementaryCodePoint(int codePoint) { >> assert MIN_SUPPLEMENTARY_CODE_POINT == (1<< 16)&& >> (MAX_SUPPLEMENTARY_CODE_POINT + 1) % (1<< 16) == 0; >> return (codePoint>> 16) != 0 >> && (codePoint>> 16)< (MAX_SUPPLEMENTARY_CODE_POINT + 1>> 16); >> // Optimized form of: codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >> //&& codePoint<= MAX_SUPPLEMENTARY_CODE_POINT; >> } >> > Keep in mind that supplementary characters are extremely rare. > Yes, but many API's in the JDK are used rarely. Why should they waste memory footprint / perform bad, particularly if it doesn't cost anything. > Therefore the existing implementation > > return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT > && codePoint<= MAX_CODE_POINT; > > will almost always perform just one comparison against a constant, > which is hard to beat. > 1. Wondering: I think there are TWO comparisons. 2. Those comparisons need to load 32 bit values from machine code, against only 8 bit values in my case. 3. The first of the 2 comparisons becomes outlined if compiled in combination with isBMPCodePoint(). (see below) > I'm not sure whether your code above gets the right answer for negative input. > Perhaps you need to do > (codePoint>>> 16)< ... > Oops, I'm afraid you are right, but fortunately it doesn't cost anything: sar becomes replaced by shr. > Martin > > >> and: >> if (Surrogate.isBMP(uc)) >> ...; >> else if (Character.isSupplementaryCodePoint(uc)) >> ...; >> else >> ...; >> >> we get (needs only 18 bytes of machine code): >> 0x00b87aac: mov %ebx,%ecx >> 0x00b87aae: sar $0x10,%ecx >> 0x00b87ab1: test %ecx,%ecx >> 0x00b87ab3: je 0x00b87acb >> 0x00b87ab5: cmp $0x11,%ecx >> 0x00b87ab8: jge 0x00b87ce6 >> 0x00b87abe: >> >> BTW the compiled code of the existing code (needs *36* bytes of machine code): 0x00b87a5c: test %ebp,%ebp 0x00b87a5e: jl 0x00b87a68 0x00b87a60: cmp $0x10000,%ebp 0x00b87a66: jl 0x00b87a8d 0x00b87a68: cmp $0x10000,%ebp 0x00b87a6e: jl 0x00b87c63 0x00b87a74: cmp $0x10ffff,%ebp 0x00b87a7a: jg 0x00b87c63 0x00b87a80: BTW 2: The code example is seen in one of the String constructors, where there the 2-comparison code is manually inlined instead of using Surrogate.isBMP(). -Ulf From martinrb at google.com Wed Mar 3 16:06:14 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 3 Mar 2010 08:06:14 -0800 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4B8E3DA3.7090902@gmx.de> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> Message-ID: <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> Sherman, would you like to file bugs for Ulf's improvements? On Wed, Mar 3, 2010 at 02:44, Ulf Zibis wrote: > Am 03.03.2010 09:00, schrieb Martin Buchholz: >> Keep in mind that supplementary characters are extremely rare. >> > > Yes, but many API's in the JDK are used rarely. > Why should they waste memory footprint / perform bad, particularly if it > doesn't cost anything. I admire your perfectionism. >> Therefore the existing implementation >> >> ?return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >> ? ? ? ? ? ? && ?codePoint<= MAX_CODE_POINT; >> >> will almost always perform just one comparison against a constant, >> which is hard to beat. >> > > 1. Wondering: I think there are TWO comparisons. > 2. Those comparisons need to load 32 bit values from machine code, against > only 8 bit values in my case. It's a good point. In the machine code, shifts are likely to use immediate values, and so will be a small win. int x = codePoint >>> 16; return x != 0 && x < 0x11; (On modern hardware, these optimizations are less valuable than they used to be; ordinary integer arithmetic is almost free) Martin From alan.bateman at sun.com Wed Mar 3 16:10:11 2010 From: alan.bateman at sun.com (alan.bateman at sun.com) Date: Wed, 03 Mar 2010 16:10:11 +0000 Subject: hg: jdk7/tl/jdk: 6931216: TEST_BUG: test/java/nio/file/WatchService/LotsOfEvents.java failed with NPE Message-ID: <20100303161050.CC14143C39@hg.openjdk.java.net> Changeset: cddb43b12d28 Author: alanb Date: 2010-03-03 16:09 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/cddb43b12d28 6931216: TEST_BUG: test/java/nio/file/WatchService/LotsOfEvents.java failed with NPE Reviewed-by: chegar ! test/java/nio/file/WatchService/LotsOfEvents.java From Xueming.Shen at Sun.COM Wed Mar 3 19:11:40 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Wed, 03 Mar 2010 11:11:40 -0800 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> Message-ID: <4B8EB46C.1010208@sun.com> #6931812 Martin Buchholz wrote: > Sherman, would you like to file bugs for Ulf's improvements? > > On Wed, Mar 3, 2010 at 02:44, Ulf Zibis wrote: > >> Am 03.03.2010 09:00, schrieb Martin Buchholz: >> > > >>> Keep in mind that supplementary characters are extremely rare. >>> >>> >> Yes, but many API's in the JDK are used rarely. >> Why should they waste memory footprint / perform bad, particularly if it >> doesn't cost anything. >> > > I admire your perfectionism. > > >>> Therefore the existing implementation >>> >>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>> && codePoint<= MAX_CODE_POINT; >>> >>> will almost always perform just one comparison against a constant, >>> which is hard to beat. >>> >>> >> 1. Wondering: I think there are TWO comparisons. >> 2. Those comparisons need to load 32 bit values from machine code, against >> only 8 bit values in my case. >> > > It's a good point. In the machine code, shifts are likely to use > immediate values, and so will be a small win. > > int x = codePoint >>> 16; > return x != 0 && x < 0x11; > > (On modern hardware, these optimizations > are less valuable than they used to be; > ordinary integer arithmetic is almost free) > > Martin > From kelly.ohair at sun.com Wed Mar 3 19:30:18 2010 From: kelly.ohair at sun.com (kelly.ohair at sun.com) Date: Wed, 03 Mar 2010 19:30:18 +0000 Subject: hg: jdk7/tl/jdk: 6931763: sanity checks broken with latest cygwin, newer egrep -i option problems Message-ID: <20100303193038.0881343C6C@hg.openjdk.java.net> Changeset: 507159d8d143 Author: ohair Date: 2010-03-03 11:29 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/507159d8d143 6931763: sanity checks broken with latest cygwin, newer egrep -i option problems Reviewed-by: jjg ! make/common/shared/Sanity.gmk From Ulf.Zibis at gmx.de Wed Mar 3 21:31:07 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 03 Mar 2010 22:31:07 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> Message-ID: <4B8ED51B.2030303@gmx.de> Am 03.03.2010 17:06, schrieb Martin Buchholz: > Sherman, would you like to file bugs for Ulf's improvements? > Thanks. > I admire your perfectionism. > Really? :-) > (On modern hardware, these optimizations > are less valuable than they used to be; > ordinary integer arithmetic is almost free) > IMHO even on modern hardware half of machine code bytes should perform ~ twice. ;-) -Ulf From joe.darcy at sun.com Thu Mar 4 00:05:47 2010 From: joe.darcy at sun.com (joe.darcy at sun.com) Date: Thu, 04 Mar 2010 00:05:47 +0000 Subject: hg: jdk7/tl/langtools: 6449781: TypeElement.getQualifiedName for anonymous classes returns null instead of an empty name Message-ID: <20100304000550.9A71443CAE@hg.openjdk.java.net> Changeset: fc7132746501 Author: darcy Date: 2010-03-03 16:05 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/fc7132746501 6449781: TypeElement.getQualifiedName for anonymous classes returns null instead of an empty name Reviewed-by: jjg ! src/share/classes/com/sun/tools/javac/jvm/ClassReader.java + test/tools/javac/processing/model/element/TestAnonClassNames.java + test/tools/javac/processing/model/element/TestAnonSourceNames.java From jonathan.gibbons at sun.com Thu Mar 4 01:23:54 2010 From: jonathan.gibbons at sun.com (jonathan.gibbons at sun.com) Date: Thu, 04 Mar 2010 01:23:54 +0000 Subject: hg: jdk7/tl/langtools: 6931927: position issues with synthesized anonymous class Message-ID: <20100304012357.898BF43CC4@hg.openjdk.java.net> Changeset: 7f5db2e8b423 Author: jjg Date: 2010-03-03 17:22 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/7f5db2e8b423 6931927: position issues with synthesized anonymous class Reviewed-by: darcy ! src/share/classes/com/sun/tools/javac/parser/JavacParser.java + test/tools/javac/tree/TestAnnotatedAnonClass.java + test/tools/javac/tree/TreePosTest.java - test/tools/javac/treepostests/TreePosTest.java From kevin.l.stern at gmail.com Thu Mar 4 01:41:26 2010 From: kevin.l.stern at gmail.com (Kevin L. Stern) Date: Wed, 3 Mar 2010 19:41:26 -0600 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream Message-ID: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> Greetings, I've noticed bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream which arise when the capacities of the data structures reach a particular threshold. More below. When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE its size reaches its capacity and an add or an insert operation is invoked, the capacity is increased by only one element. Notice that in the following excerpt from ArrayList.ensureCapacity the new capacity is set to (3/2) * oldCapacity + 1 unless this value would not suffice to accommodate the required capacity in which case it is set to the required capacity. If the current capacity is at least (2/3)*Integer.MAX_VALUE, then (oldCapacity * 3)/2 + 1 overflows and resolves to a negative number resulting in the new capacity being set to the required capacity. The major consequence of this is that each subsequent add/insert operation results in a full resize of the ArrayList causing performance to degrade significantly. int newCapacity = (oldCapacity * 3)/2 + 1; if (newCapacity < minCapacity) newCapacity = minCapacity; Hashtable breaks entirely when the size of its backing array reaches (1/2) * Integer.MAX_VALUE and a rehash is necessary as is evident from the following excerpt from rehash. Notice that rehash will attempt to create an array of negative size if the size of the backing array reaches (1/2) * Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves to a negative number. int newCapacity = oldCapacity * 2 + 1; HashtableEntry newTable[] = new HashtableEntry[newCapacity]; When the capacity of the backing array in a ByteArrayOutputStream reaches (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write operation is invoked, the capacity of the backing array is increased only by the required number of elements. Notice that in the following excerpt from ByteArrayOutputStream.write(int) the new backing array capacity is set to 2 * buf.length unless this value would not suffice to accommodate the required capacity in which case it is set to the required capacity. If the current backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1, then buf.length << 1 overflows and resolves to a negative number resulting in the new capacity being set to the required capacity. The major consequence of this, like with ArrayList, is that each subsequent write operation results in a full resize of the ByteArrayOutputStream causing performance to degrade significantly. int newcount = count + 1; if (newcount > buf.length) { buf = Arrays.copyOf(buf, Math.max(buf.length << 1, newcount)); } It is interesting to note that any statements about the amortized time complexity of add/insert operations, such as the one in the ArrayList javadoc, are invalidated by the performance related bugs. One solution to the above situations is to set the new capacity of the backing array to Integer.MAX_VALUE when the initial size calculation results in a negative number during a resize. Apologies if these bugs are already known. Regards, Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason_mehrens at hotmail.com Thu Mar 4 02:08:46 2010 From: jason_mehrens at hotmail.com (Jason Mehrens) Date: Wed, 3 Mar 2010 20:08:46 -0600 Subject: Need reviewer for forward port of 6815768 (File.getXXXSpace) and 6815768 (String.hashCode) In-Reply-To: <4B8A952B.1010007@gmx.de> References: <4B86B8A4.8050709@sun.com> <4B86BF2E.8030208@sun.com>,<4B86CAC5.4050405@sun.com> <4B86DAE1.5050208@gmx.de>,<4B86DBDC.9090703@sun.com> <4B86F490.60704@sun.com>,<4B8A952B.1010007@gmx.de> Message-ID: String.hash should only have two known states, zero and the actual computed hash code. http://bugs.sun.com/view_bug.do?bug_id=6611830 Jason > Date: Sun, 28 Feb 2010 17:09:15 +0100 > From: Ulf.Zibis at gmx.de > To: Alan.Bateman at Sun.COM > Subject: Re: Need reviewer for forward port of 6815768 (File.getXXXSpace) and 6815768 (String.hashCode) > CC: core-libs-dev at openjdk.java.net; dmitry.nadezhin at gmail.com; Kelly.Ohair at Sun.COM > > Another thought: > > In the constructors of String we could initialize hash = > Integer.MIN_VALUE except if length == 0. > Then we could stay at the fastest version: > > public int hashCode() { > int h = hash; > if (h == Integer.MIN_VALUE) { > h = 0; > char[] val = value; > for (int i = offset, limit = count + i; i != limit; ) > h = 31 * h + val[i++]; > hash = h; > } > return h; > } _________________________________________________________________ Hotmail: Trusted email with Microsoft?s powerful SPAM protection. http://clk.atdmt.com/GBL/go/201469226/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From weijun.wang at sun.com Thu Mar 4 02:40:26 2010 From: weijun.wang at sun.com (weijun.wang at sun.com) Date: Thu, 04 Mar 2010 02:40:26 +0000 Subject: hg: jdk7/tl/jdk: 3 new changesets Message-ID: <20100304024122.AFDFC43CD7@hg.openjdk.java.net> Changeset: 61c298558549 Author: weijun Date: 2010-03-04 10:37 +0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/61c298558549 6844909: support allow_weak_crypto in krb5.conf Reviewed-by: valeriep ! src/share/classes/sun/security/krb5/internal/crypto/EType.java + test/sun/security/krb5/etype/WeakCrypto.java + test/sun/security/krb5/etype/weakcrypto.conf Changeset: 0f383673ce31 Author: weijun Date: 2010-03-04 10:38 +0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/0f383673ce31 6923681: Jarsigner crashes during timestamping Reviewed-by: vinnie ! src/share/classes/sun/security/tools/TimestampedSigner.java Changeset: 5e15b70e6d27 Author: weijun Date: 2010-03-04 10:38 +0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/5e15b70e6d27 6880321: sun.security.provider.JavaKeyStore abuse of OOM Exception handling Reviewed-by: xuelei ! src/share/classes/sun/security/provider/JavaKeyStore.java From jonathan.gibbons at sun.com Thu Mar 4 03:35:50 2010 From: jonathan.gibbons at sun.com (jonathan.gibbons at sun.com) Date: Thu, 04 Mar 2010 03:35:50 +0000 Subject: hg: jdk7/tl/langtools: 6931126: jtreg tests not Windows friendly Message-ID: <20100304033552.E838743CE5@hg.openjdk.java.net> Changeset: 117c95448ab9 Author: jjg Date: 2010-03-03 19:34 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/117c95448ab9 6931126: jtreg tests not Windows friendly Reviewed-by: darcy ! test/tools/javac/ThrowsIntersection_1.java ! test/tools/javac/ThrowsIntersection_2.java ! test/tools/javac/ThrowsIntersection_3.java ! test/tools/javac/ThrowsIntersection_4.java ! test/tools/javac/generics/NameOrder.java From develop4lasu at gmail.com Thu Mar 4 18:33:43 2010 From: develop4lasu at gmail.com (=?UTF-8?Q?Marek_Kozie=C5=82?=) Date: Thu, 4 Mar 2010 19:33:43 +0100 Subject: Need reviewer for forward port of 6815768 (File.getXXXSpace) and 6815768 (String.hashCode) In-Reply-To: <4B8A952B.1010007@gmx.de> References: <4B86B8A4.8050709@sun.com> <4B86BF2E.8030208@sun.com> <4B86CAC5.4050405@sun.com> <4B86DAE1.5050208@gmx.de> <4B86DBDC.9090703@sun.com> <4B86F490.60704@sun.com> <4B8A952B.1010007@gmx.de> Message-ID: <28bca0ff1003041033j2547fe95l80614ddf38799331@mail.gmail.com> 2010/2/28 Ulf Zibis : > Am 25.02.2010 23:07, schrieb Alan Bateman: >> >> Kelly O'Hair wrote: >>> >>> Yup. ?My eyes must be tired, I didn't see that. :^( >> >> Too many repositories in the air at the same time. The webrev has been >> refreshed. Thanks Ulf. >> >> > > Another thought: > > In the constructors of String we could initialize hash = Integer.MIN_VALUE > except if length == 0. > Then we could stay at the fastest version: > > ? ?public int hashCode() { > ? ? ? ?int h = hash; > ? ? ? ?if (h == Integer.MIN_VALUE) { > ? ? ? ? ? ?h = 0; > ? ? ? ? ? ?char[] val = value; > ? ? ? ? ? ?for (int i = offset, limit = count + i; i != limit; ) > ? ? ? ? ? ? ? ?h = 31 * h + val[i++]; > ? ? ? ? ? ?hash = h; > ? ? ? ?} > ? ? ? ?return h; > ? ?} > > As an alternative we could use: > private static final int UNKNOWN_HASH = 1; > Justification: > Using a small value results in little shorter byte code and machine code > footprint after compilation. > Additionally on some CPU's this likely will perform little better, but never > worse. > > Please note: > Original loop causes 2 values to increment: > ? ? ? ? ? ?for (int i = 0; i < len; i++) { > ? ? ? ? ? ? ? ?h = 31*h + val[off++]; > ? ? ? ? ? ?} > This is inefficient as I have proved in a little micro-benchmark. > > -Ulf > > > > Hello, I would suggest: public int hashCode() { ? ? ? ?int h = hash; if (h == 0) { h = 0; char[] val = value; for (int i = offset, limit = count + i; i != limit; ) h = 31 * h + val[i++]; if (h == 0) h++; hash = h; } return h; } But personally I would consider: 1. make hash long 2. change method of it's generation to ensure that: -- in most cases String.concat(...) would be able to determine new hash from substring hashes so it would be available to set it in constructor always (with little effort it's possible now). -- would contains flag (bit) that would tell us if hash is bijection public boolean equals(Object anObject) { if (this == anObject) { return true; } if (anObject instanceof String) { String anotherString = (String)anObject; if (hash!=anotherString.hash) return false; if (hash&isHashBijection!=0) return true; int n = count; if (n == anotherString.count) { char v1[] = value; char v2[] = anotherString.value; int i = offset; int j = anotherString.offset; while (n-- != 0) { if (v1[i++] != v2[j++]) return false; } return true; } } return false; } As you know this would require a lot of work and probably it's not worth it's effect. Notice one more thing if we would be able to knew if String is in intern version, equal could look like: public boolean equals(Object anObject) { if (this == anObject) { return true; } if (anObject instanceof String) { String anotherString = (String)anObject; if (isIntern() && anotherString.isIntern()) return false;// we checked it at first line already int n = count; if (n == anotherString.count) { char v1[] = value; char v2[] = anotherString.value; int i = offset; int j = anotherString.offset; while (n-- != 0) { if (v1[i++] != v2[j++]) return false; } return true; } } return false; } This solution would powdered .intern() so once someone would optimise application and use interns it would improve speed and memory usage, also it do not have negative impact like calculating hash in constructor, the problem is where this information should be stored? (I have some idea about it but I doubt if this would be accepted) -- Pozdrowionka. / Regards. Lasu aka Marek Kozie? http://lasu2string.blogspot.com/ From Ulf.Zibis at gmx.de Thu Mar 4 20:04:24 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 04 Mar 2010 21:04:24 +0100 Subject: Tune String's hashCode() + equals() [was: Need reviewer for forward port of 6815768 (File.getXXXSpace) and 6815768 (String.hashCode)] In-Reply-To: <28bca0ff1003041033j2547fe95l80614ddf38799331@mail.gmail.com> References: <4B86B8A4.8050709@sun.com> <4B86BF2E.8030208@sun.com> <4B86CAC5.4050405@sun.com> <4B86DAE1.5050208@gmx.de> <4B86DBDC.9090703@sun.com> <4B86F490.60704@sun.com> <4B8A952B.1010007@gmx.de> <28bca0ff1003041033j2547fe95l80614ddf38799331@mail.gmail.com> Message-ID: <4B901248.8020408@gmx.de> Am 04.03.2010 19:33, schrieb Marek Kozie?: > Hello, > I would suggest: > public int hashCode() { > int h = hash; > if (h == 0) { > h = 0; > char[] val = value; > for (int i = offset, limit = count + i; i != limit; ) > h = 31 * h + val[i++]; > if (h == 0) > h++; > hash = h; > } > return h; > } > > Intersting alternative, but I'm afraid, this is against the spec. Shifting all 0's to 1 would break String's hash definition: h = 31 * h + val[i++]. > But personally I would consider: > 1. make hash long > 2. change method of it's generation to ensure that: > -- in most cases String.concat(...) would be able to determine new > hash from substring hashes so it would be available to set it in > constructor always (with little effort it's possible now). > -- would contains flag (bit) that would tell us if hash is bijection > > public boolean equals(Object anObject) { > if (this == anObject) { > return true; > } > if (anObject instanceof String) { > String anotherString = (String)anObject; > > if (hash!=anotherString.hash) return false; > only valid if (hash != 0 && anotherString.hash != 0) > if (hash&isHashBijection!=0) return true; > Which integer value should isHashBijection have ? > int n = count; > if (n == anotherString.count) { > char v1[] = value; > char v2[] = anotherString.value; > int i = offset; > int j = anotherString.offset; > while (n-- != 0) { > if (v1[i++] != v2[j++]) > return false; > } > return true; > } > } > return false; > } > > As you know this would require a lot of work and probably it's not > worth it's effect. > > > Notice one more thing if we would be able to knew if String is in > intern version, equal could look like: > > public boolean equals(Object anObject) { > if (this == anObject) { > return true; > } > if (anObject instanceof String) { > String anotherString = (String)anObject; > if (isIntern()&& anotherString.isIntern()) return false;// we > checked it at first line already > > int n = count; > if (n == anotherString.count) { > char v1[] = value; > char v2[] = anotherString.value; > int i = offset; > int j = anotherString.offset; > while (n-- != 0) { > if (v1[i++] != v2[j++]) > return false; > } > return true; > } > } > return false; > } > Interned strings have their hashes already computed to organize them in internal hash map. Unfortunately those hashes are not back-propagated to the Java object, so equals() can't benefit from them for now. -Ulf From develop4lasu at gmail.com Thu Mar 4 20:31:20 2010 From: develop4lasu at gmail.com (=?UTF-8?Q?Marek_Kozie=C5=82?=) Date: Thu, 4 Mar 2010 21:31:20 +0100 Subject: Tune String's hashCode() + equals() [was: Need reviewer for forward port of 6815768 (File.getXXXSpace) and 6815768 (String.hashCode)] In-Reply-To: <4B901248.8020408@gmx.de> References: <4B86B8A4.8050709@sun.com> <4B86BF2E.8030208@sun.com> <4B86CAC5.4050405@sun.com> <4B86DAE1.5050208@gmx.de> <4B86DBDC.9090703@sun.com> <4B86F490.60704@sun.com> <4B8A952B.1010007@gmx.de> <28bca0ff1003041033j2547fe95l80614ddf38799331@mail.gmail.com> <4B901248.8020408@gmx.de> Message-ID: <28bca0ff1003041231n691bd44difcced567be841cd8@mail.gmail.com> @Ulf Few explanations: 1. > Intersting alternative, but I'm afraid, this is against the spec. > Shifting all 0's to 1 would break String's hash definition: h = 31 * h + val[i++]. Yes it does, any way i think spec is to tight here. Do we really need hash of each value even if String have length like 600000? there is noting good coming from it in my opinion. Did any one saw at least one code relaying on that ? Btw. Same come with .compare 2. private static final long isHashBijection= 0x8000000000000000L; should be fine 3. Second sample would work only if hash would be set in constructor so even 0 would be valid hash. 4. Maybe I'm wrong but if most cases when hash should be count is String concatenation then we could make it +/- > a.hash+b.hash*powderOf31[a.length()] so it would not consume so much time. 5. Intern string do not need hash codes co comparing cos they have same address, so first loop would return true if they are equal, after this we need only to check if they are not equal: > if (isIntern() && anotherString.isIntern()) return false; 6. Have in mind that adding additional fields to string might be not an option, because memory lost this way may have great impact on average application efficiency. -- Pozdrowionka. / Regards. Lasu aka Marek Kozie? http://lasu2string.blogspot.com/ From Ulf.Zibis at gmx.de Thu Mar 4 21:10:18 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 04 Mar 2010 22:10:18 +0100 Subject: Need reviewer for forward port of 6815768 (File.getXXXSpace) and 6815768 (String.hashCode) In-Reply-To: References: <4B86B8A4.8050709@sun.com> <4B86BF2E.8030208@sun.com>, <4B86CAC5.4050405@sun.com> <4B86DAE1.5050208@gmx.de>, <4B86DBDC.9090703@sun.com> <4B86F490.60704@sun.com>, <4B8A952B.1010007@gmx.de> Message-ID: <4B9021BA.3060800@gmx.de> Am 04.03.2010 03:08, schrieb Jason Mehrens: > String.hash should only have two known states, zero and the actual > computed hash code. > > http://bugs.sun.com/view_bug.do?bug_id=6611830 I far theory yes. But have you read the evaluation ? "This bug pattern is endemic in the JDK sources." -Ulf > > Jason > > > > Date: Sun, 28 Feb 2010 17:09:15 +0100 > > From: Ulf.Zibis at gmx.de > > To: Alan.Bateman at Sun.COM > > Subject: Re: Need reviewer for forward port of 6815768 > (File.getXXXSpace) and 6815768 (String.hashCode) > > CC: core-libs-dev at openjdk.java.net; dmitry.nadezhin at gmail.com; > Kelly.Ohair at Sun.COM > > > > Another thought: > > > > In the constructors of String we could initialize hash = > > Integer.MIN_VALUE except if length == 0. > > Then we could stay at the fastest version: > > > > public int hashCode() { > > int h = hash; > > if (h == Integer.MIN_VALUE) { > > h = 0; > > char[] val = value; > > for (int i = offset, limit = count + i; i != limit; ) > > h = 31 * h + val[i++]; > > hash = h; > > } > > return h; > > } > > ------------------------------------------------------------------------ > Hotmail: Trusted email with Microsoft?s powerful SPAM protection. Sign > up now. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ulf.Zibis at gmx.de Thu Mar 4 21:36:05 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 04 Mar 2010 22:36:05 +0100 Subject: Tune String's hashCode() + equals() In-Reply-To: <28bca0ff1003041231n691bd44difcced567be841cd8@mail.gmail.com> References: <4B86B8A4.8050709@sun.com> <4B86BF2E.8030208@sun.com> <4B86CAC5.4050405@sun.com> <4B86DAE1.5050208@gmx.de> <4B86DBDC.9090703@sun.com> <4B86F490.60704@sun.com> <4B8A952B.1010007@gmx.de> <28bca0ff1003041033j2547fe95l80614ddf38799331@mail.gmail.com> <4B901248.8020408@gmx.de> <28bca0ff1003041231n691bd44difcced567be841cd8@mail.gmail.com> Message-ID: <4B9027C5.6030304@gmx.de> Much thanks for your effort. Am 04.03.2010 21:31, schrieb Marek Kozie?: > @Ulf > Few explanations: > 1. > >> Intersting alternative, but I'm afraid, this is against the spec. >> Shifting all 0's to 1 would break String's hash definition: h = 31 * h + val[i++]. >> > Yes it does, any way i think spec is to tight here. Do we really need > hash of each value even if String have length like 600000? > there is noting good coming from it in my opinion. > Did any one saw at least one code relaying on that ? > Btw. Same come with .compare > See discussion on project Coin list, subject "Benefit from computing String Hash at compile time?" > 2. > private static final long isHashBijection= 0x8000000000000000L; > should be fine > Now I understand how it should work. Your algorithm will guarantee only 2^63 bijectional values. So strings of lenght >= 4 can't have bijectional hashes as 4 chars count 2^64 variations. > 3. > Second sample would work only if hash would be set in constructor so > even 0 would be valid hash. > > 4. > Maybe I'm wrong but if most cases when hash should be count is String > concatenation then we could make it +/- > >> a.hash+b.hash*powderOf31[a.length()] >> > so it would not consume so much time. > Interesting idea. I suggest to file an RFE. > 5. > Intern string do not need hash codes co comparing cos they have same > address, so first loop would return true if they are equal, after this > we need only to check if they are not equal: > >> if (isIntern()&& anotherString.isIntern()) return false; >> You are right, but if (h1 != 0 && h2 != 0 && h1 != h2) return false; would perform same (if already computed internal hash would be back-propagated to the Java object). > 6. > Have in mind that adding additional fields to string might be not an > option, because memory lost this way may have great impact on average > application efficiency. > + would break compatibility if objects are serialized. -Ulf From develop4lasu at gmail.com Thu Mar 4 22:38:52 2010 From: develop4lasu at gmail.com (=?UTF-8?Q?Marek_Kozie=C5=82?=) Date: Thu, 4 Mar 2010 23:38:52 +0100 Subject: Tune String's hashCode() + equals() In-Reply-To: <4B9027C5.6030304@gmx.de> References: <4B86B8A4.8050709@sun.com> <4B86CAC5.4050405@sun.com> <4B86DAE1.5050208@gmx.de> <4B86DBDC.9090703@sun.com> <4B86F490.60704@sun.com> <4B8A952B.1010007@gmx.de> <28bca0ff1003041033j2547fe95l80614ddf38799331@mail.gmail.com> <4B901248.8020408@gmx.de> <28bca0ff1003041231n691bd44difcced567be841cd8@mail.gmail.com> <4B9027C5.6030304@gmx.de> Message-ID: <28bca0ff1003041438m75c87d0fk30b3e1e097e1efb4@mail.gmail.com> 2010/3/4 Ulf Zibis : >> 5. >> Intern string do not need hash codes co comparing cos they have same >> address, so first loop would return true if they are equal, after this >> we need only to check if they are not equal: >> >>> >>> if (isIntern()&& ?anotherString.isIntern()) return false; >>> > > You are right, but > ? ?if (h1 != 0 && h2 != 0 && h1 != h2) return false; > would perform same (if already computed internal hash would be > back-propagated to the Java object). > Could you explain what do you mean ? If u search for optimization i suggest (if it's not already partially implemented): add : public static String String.intern(String str, int waste) which would work like String.intern(String str) except that if in intern table there is already 'other' String that: str.startWith(other) && other.length() References: <4B86B8A4.8050709@sun.com> <4B86CAC5.4050405@sun.com> <4B86DAE1.5050208@gmx.de> <4B86DBDC.9090703@sun.com> <4B86F490.60704@sun.com> <4B8A952B.1010007@gmx.de> <28bca0ff1003041033j2547fe95l80614ddf38799331@mail.gmail.com> <4B901248.8020408@gmx.de> <28bca0ff1003041231n691bd44difcced567be841cd8@mail.gmail.com> <4B9027C5.6030304@gmx.de> <28bca0ff1003041438m75c87d0fk30b3e1e097e1efb4@mail.gmail.com> Message-ID: <4B90422C.6070705@gmx.de> Am 04.03.2010 23:38, schrieb Marek Kozie?: > 2010/3/4 Ulf Zibis: > > >>> 5. >>> Intern string do not need hash codes co comparing cos they have same >>> address, so first loop would return true if they are equal, after this >>> we need only to check if they are not equal: >>> >>> >>>> if (isIntern()&& anotherString.isIntern()) return false; >>>> >>>> >> You are right, but >> if (h1 != 0&& h2 != 0&& h1 != h2) return false; >> would perform same (if already computed internal hash would be >> back-propagated to the Java object). >> >> > Could you explain what do you mean ? > h1 = this.hash; h2 = otherString.hash; See: In hotspot/src/share/vm/prims/jvm.cpp : JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str)) JVMWrapper("JVM_InternString"); JvmtiVMObjectAllocEventCollector oam; if (str == NULL) return NULL; oop string = JNIHandles::resolve_non_null(str); oop result = StringTable::intern(string, CHECK_NULL); return (jstring) JNIHandles::make_local(env, result); JVM_END In hotspot/src/share/vm/classfile/symbolTable.cpp : oop StringTable::intern(Handle string_or_null, jchar* name, int len, TRAPS) { unsigned int hashValue = hash_string(name, len); int index = the_table()->hash_to_index(hashValue); oop string = the_table()->lookup(index, name, len, hashValue); // Found if (string != NULL) return string; // Otherwise, add to symbol to table return the_table()->basic_add(index, string_or_null, name, len, hashValue, CHECK_NULL); } int StringTable::hash_string(jchar* s, int len) { unsigned h = 0; for (len = s + len*sizeof(jchar); s < len; s++) h = 31*h + (unsigned) *s; return h; } From lana.steuck at sun.com Fri Mar 5 06:46:13 2010 From: lana.steuck at sun.com (lana.steuck at sun.com) Date: Fri, 05 Mar 2010 06:46:13 +0000 Subject: hg: jdk7/tl: Added tag jdk7-b84 for changeset 2f3ea057d1ad Message-ID: <20100305064613.4D93E43E6F@hg.openjdk.java.net> Changeset: cf26288a114b Author: mikejwre Date: 2010-02-18 13:31 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/rev/cf26288a114b Added tag jdk7-b84 for changeset 2f3ea057d1ad ! .hgtags From lana.steuck at sun.com Fri Mar 5 06:46:19 2010 From: lana.steuck at sun.com (lana.steuck at sun.com) Date: Fri, 05 Mar 2010 06:46:19 +0000 Subject: hg: jdk7/tl/corba: Added tag jdk7-b84 for changeset 68c8961a82e4 Message-ID: <20100305064620.B3DA043E70@hg.openjdk.java.net> Changeset: c67a9df7bc0c Author: mikejwre Date: 2010-02-18 13:31 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/corba/rev/c67a9df7bc0c Added tag jdk7-b84 for changeset 68c8961a82e4 ! .hgtags From lana.steuck at sun.com Fri Mar 5 06:48:40 2010 From: lana.steuck at sun.com (lana.steuck at sun.com) Date: Fri, 05 Mar 2010 06:48:40 +0000 Subject: hg: jdk7/tl/hotspot: 27 new changesets Message-ID: <20100305065004.1989143E71@hg.openjdk.java.net> Changeset: 125eb6a9fccf Author: mikejwre Date: 2010-02-18 13:31 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/125eb6a9fccf Added tag jdk7-b84 for changeset ffc8d176b84b ! .hgtags Changeset: 745c853ee57f Author: johnc Date: 2010-01-29 14:51 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/745c853ee57f 6885297: java -XX:RefDiscoveryPolicy=2 or -XX:TLABWasteTargetPercent=0 cause VM crash Summary: Interval checking is now being performed on the values passed in for these two flags. The current acceptable range for RefDiscoveryPolicy is [0..1], and for TLABWasteTargetPercent it is [1..100]. Reviewed-by: apetrusenko, ysr ! src/share/vm/includeDB_core ! src/share/vm/memory/referenceProcessor.hpp ! src/share/vm/runtime/arguments.cpp ! src/share/vm/runtime/arguments.hpp Changeset: 6484c4ee11cb Author: ysr Date: 2010-02-01 17:29 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/6484c4ee11cb 6904516: More object array barrier fixes, following up on 6906727 Summary: Fixed missing pre-barrier calls for G1, modified C1 to call pre- and correct post-barrier interfaces, deleted obsolete interface, (temporarily) disabled redundant deferred barrier in BacktraceBuilder. Reviewed-by: coleenp, jmasa, kvn, never ! src/share/vm/c1/c1_Runtime1.cpp ! src/share/vm/classfile/javaClasses.cpp ! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp ! src/share/vm/memory/barrierSet.hpp ! src/share/vm/memory/barrierSet.inline.hpp ! src/share/vm/runtime/stubRoutines.cpp Changeset: deada8912c54 Author: johnc Date: 2010-02-02 18:39 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/deada8912c54 6914402: G1: assert(!is_young_card(cached_ptr),"shouldn't get a card in young region") Summary: Invalid assert. Filter cards evicted from the card count cache instead. Reviewed-by: apetrusenko, tonyp ! src/share/vm/gc_implementation/g1/concurrentG1Refine.cpp Changeset: 230fac611b50 Author: johnc Date: 2010-02-08 09:58 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/230fac611b50 Merge ! src/share/vm/c1/c1_Runtime1.cpp ! src/share/vm/includeDB_core Changeset: 455df1b81409 Author: kamg Date: 2010-02-08 13:49 -0500 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/455df1b81409 6587322: dtrace probe object__alloc doesn't fire in some situations on amd64 Summary: Fix misplaced probe point Reviewed-by: rasbold, phh Contributed-by: neojia at gmail.com ! src/cpu/x86/vm/templateTable_x86_64.cpp Changeset: 95d21201c29a Author: apangin Date: 2010-02-11 10:48 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/95d21201c29a Merge Changeset: 3f5b7efb9642 Author: never Date: 2010-02-05 11:07 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/3f5b7efb9642 6920293: OptimizeStringConcat causing core dumps Reviewed-by: kvn, twisti ! src/os_cpu/solaris_x86/vm/os_solaris_x86.cpp ! src/share/vm/code/nmethod.cpp ! src/share/vm/opto/stringopts.cpp ! src/share/vm/runtime/sharedRuntime.cpp Changeset: 576e77447e3c Author: kvn Date: 2010-02-07 12:15 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/576e77447e3c 6923002: assert(false,"this call site should not be polymorphic") Summary: Clear the total count when a receiver information is cleared. Reviewed-by: never, jrose ! src/cpu/sparc/vm/c1_LIRAssembler_sparc.cpp ! src/cpu/sparc/vm/interp_masm_sparc.cpp ! src/cpu/sparc/vm/sharedRuntime_sparc.cpp ! src/cpu/x86/vm/c1_LIRAssembler_x86.cpp ! src/cpu/x86/vm/interp_masm_x86_32.cpp ! src/cpu/x86/vm/interp_masm_x86_64.cpp ! src/share/vm/ci/ciMethod.cpp ! src/share/vm/oops/methodDataOop.hpp ! src/share/vm/opto/doCall.cpp ! src/share/vm/opto/runtime.cpp ! src/share/vm/runtime/arguments.cpp Changeset: f516d5d7a019 Author: kvn Date: 2010-02-08 12:20 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/f516d5d7a019 6910605: C2: NullPointerException/ClassCaseException is thrown when C2 with DeoptimizeALot is used Summary: Set the reexecute bit for runtime calls _new_array_Java when they used for _multianewarray bytecode. Reviewed-by: never ! src/share/vm/code/pcDesc.cpp ! src/share/vm/opto/graphKit.cpp ! src/share/vm/opto/parse3.cpp + test/compiler/6910605/Test.java Changeset: f70b0d9ab095 Author: kvn Date: 2010-02-09 01:31 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/f70b0d9ab095 6910618: C2: Error: assert(d->is_oop(),"JVM_ArrayCopy: dst not an oop") Summary: Mark in PcDesc call sites which return oop and save the result oop across objects reallocation during deoptimization. Reviewed-by: never ! src/share/vm/c1/c1_IR.hpp ! src/share/vm/code/debugInfoRec.cpp ! src/share/vm/code/debugInfoRec.hpp ! src/share/vm/code/nmethod.cpp ! src/share/vm/code/pcDesc.hpp ! src/share/vm/code/scopeDesc.cpp ! src/share/vm/code/scopeDesc.hpp ! src/share/vm/includeDB_core ! src/share/vm/opto/output.cpp ! src/share/vm/prims/jvmtiCodeBlobEvents.cpp ! src/share/vm/runtime/deoptimization.cpp + test/compiler/6910618/Test.java Changeset: 4ee1c645110e Author: kvn Date: 2010-02-09 10:21 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/4ee1c645110e 6924097: assert((_type == Type::MEMORY) == (_adr_type != 0),"adr_type for memory phis only") Summary: Use PhiNode::make_blank(r, n) method to construct the phi. Reviewed-by: never ! src/share/vm/opto/loopopts.cpp Changeset: e3a4305c6bc3 Author: kvn Date: 2010-02-12 08:54 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/e3a4305c6bc3 6925249: assert(last_sp < (intptr_t*) interpreter_frame_monitor_begin(),"bad tos") Summary: Fix assert since top deoptimized frame has last_sp == interpreter_frame_monitor_begin if there are no expressions. Reviewed-by: twisti ! src/cpu/x86/vm/frame_x86.inline.hpp ! src/share/vm/runtime/deoptimization.cpp ! src/share/vm/runtime/frame.cpp ! src/share/vm/runtime/vframeArray.cpp Changeset: c09ee209b65c Author: kvn Date: 2010-02-12 10:34 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/c09ee209b65c 6926048: Improve Zero performance Summary: Make Zero figure out result types in a similar way to C++ interpreter implementation. Reviewed-by: kvn Contributed-by: gbenson at redhat.com ! src/cpu/zero/vm/cppInterpreter_zero.cpp ! src/cpu/zero/vm/cppInterpreter_zero.hpp Changeset: 7b4415a18c8a Author: kvn Date: 2010-02-12 15:27 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/7b4415a18c8a Merge ! src/cpu/sparc/vm/c1_LIRAssembler_sparc.cpp ! src/cpu/x86/vm/c1_LIRAssembler_x86.cpp ! src/share/vm/includeDB_core ! src/share/vm/opto/graphKit.cpp ! src/share/vm/opto/runtime.cpp ! src/share/vm/runtime/arguments.cpp ! src/share/vm/runtime/sharedRuntime.cpp Changeset: 38836cf1d8d2 Author: tonyp Date: 2010-02-05 11:05 -0500 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/38836cf1d8d2 6920977: G1: guarantee(k == probe->klass(),"klass should be in dictionary") fails Summary: the guarantee is too strict and the test will fail (incorrectly) if the class is not in the system dictionary but in the placeholders. Reviewed-by: acorn, phh ! src/share/vm/classfile/loaderConstraints.cpp ! src/share/vm/classfile/loaderConstraints.hpp ! src/share/vm/classfile/systemDictionary.cpp ! src/share/vm/includeDB_core Changeset: 9eee977dd1a9 Author: tonyp Date: 2010-02-08 14:23 -0500 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/9eee977dd1a9 6802453: G1: hr()->is_in_reserved(from),"Precondition." Summary: The operations of re-using a RSet component and expanding the same RSet component were not mutually exlusive, and this could lead to RSets getting corrupted and entries being dropped. Reviewed-by: iveresov, johnc ! src/share/vm/gc_implementation/g1/heapRegionRemSet.cpp Changeset: 8859772195c6 Author: johnc Date: 2010-02-09 13:56 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/8859772195c6 6782663: Data produced by PrintGCApplicationConcurrentTime and PrintGCApplicationStoppedTime is not accurate. Summary: Update and display the timers associated with these flags for all safepoints. Reviewed-by: ysr, jcoomes ! src/share/vm/runtime/vmThread.cpp ! src/share/vm/services/runtimeService.cpp Changeset: 0414c1049f15 Author: iveresov Date: 2010-02-11 15:52 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/0414c1049f15 6923991: G1: improve scalability of RSet scanning Summary: Implemented block-based work stealing. Moved copying during the rset scanning phase to the main copying phase. Made the size of rset table depend on the region size. Reviewed-by: apetrusenko, tonyp ! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp ! src/share/vm/gc_implementation/g1/g1CollectedHeap.hpp ! src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp ! src/share/vm/gc_implementation/g1/g1OopClosures.hpp ! src/share/vm/gc_implementation/g1/g1OopClosures.inline.hpp ! src/share/vm/gc_implementation/g1/g1RemSet.cpp ! src/share/vm/gc_implementation/g1/g1_globals.hpp ! src/share/vm/gc_implementation/g1/g1_specialized_oop_closures.hpp ! src/share/vm/gc_implementation/g1/heapRegionRemSet.cpp ! src/share/vm/gc_implementation/g1/heapRegionRemSet.hpp ! src/share/vm/gc_implementation/g1/sparsePRT.cpp ! src/share/vm/gc_implementation/g1/sparsePRT.hpp ! src/share/vm/memory/cardTableModRefBS.hpp ! src/share/vm/utilities/globalDefinitions.hpp Changeset: 58add740c4ee Author: johnc Date: 2010-02-16 14:11 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/58add740c4ee Merge ! src/share/vm/includeDB_core Changeset: e7b1cc79bd25 Author: kvn Date: 2010-02-16 16:17 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/e7b1cc79bd25 6926697: "optimized" VM build failed: The type "AdapterHandlerTableIterator" is incomplete Summary: Define AdapterHandlerTableIterator class as non product instead of debug. Reviewed-by: never ! src/share/vm/runtime/sharedRuntime.cpp Changeset: 106f41e88c85 Author: never Date: 2010-02-16 20:07 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/106f41e88c85 6877221: Endless deoptimizations in OSR nmethod Reviewed-by: kvn ! src/share/vm/opto/parse1.cpp Changeset: b4b440360f1e Author: twisti Date: 2010-02-18 11:35 +0100 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/b4b440360f1e 6926782: CodeBuffer size too small after 6921352 Summary: After 6921352 the CodeBuffer size was too small. Reviewed-by: kvn, never ! src/share/vm/opto/callGenerator.cpp ! src/share/vm/opto/compile.cpp ! src/share/vm/opto/compile.hpp ! src/share/vm/opto/output.cpp Changeset: 3b687c53c266 Author: twisti Date: 2010-02-18 06:54 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/3b687c53c266 6927165: Zero S/390 fixes Summary: Fixes two failures on 31-bit S/390. Reviewed-by: twisti Contributed-by: Gary Benson ! src/cpu/zero/vm/globals_zero.hpp ! src/os_cpu/linux_zero/vm/os_linux_zero.hpp Changeset: 72f1840531a4 Author: twisti Date: 2010-02-18 10:44 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/72f1840531a4 Merge Changeset: 1f341bb67b5b Author: trims Date: 2010-02-18 22:15 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/1f341bb67b5b Merge Changeset: 6c9796468b91 Author: trims Date: 2010-02-18 22:16 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/6c9796468b91 6927886: Bump the HS17 build number to 10 Summary: Update the HS17 build number to 10 Reviewed-by: jcoomes ! make/hotspot_version From lana.steuck at sun.com Fri Mar 5 06:53:59 2010 From: lana.steuck at sun.com (lana.steuck at sun.com) Date: Fri, 05 Mar 2010 06:53:59 +0000 Subject: hg: jdk7/tl/jaxp: Added tag jdk7-b84 for changeset 32c0cf01d555 Message-ID: <20100305065359.9313F43E73@hg.openjdk.java.net> Changeset: 6c0ccabb430d Author: mikejwre Date: 2010-02-18 13:31 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jaxp/rev/6c0ccabb430d Added tag jdk7-b84 for changeset 32c0cf01d555 ! .hgtags From lana.steuck at sun.com Fri Mar 5 06:54:06 2010 From: lana.steuck at sun.com (lana.steuck at sun.com) Date: Fri, 05 Mar 2010 06:54:06 +0000 Subject: hg: jdk7/tl/jaxws: Added tag jdk7-b84 for changeset 8bc02839eee4 Message-ID: <20100305065406.7BDCD43E74@hg.openjdk.java.net> Changeset: 8424512588ff Author: mikejwre Date: 2010-02-18 13:31 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jaxws/rev/8424512588ff Added tag jdk7-b84 for changeset 8bc02839eee4 ! .hgtags From lana.steuck at sun.com Fri Mar 5 06:54:39 2010 From: lana.steuck at sun.com (lana.steuck at sun.com) Date: Fri, 05 Mar 2010 06:54:39 +0000 Subject: hg: jdk7/tl/jdk: 5 new changesets Message-ID: <20100305065612.5781943E77@hg.openjdk.java.net> Changeset: a9b4fde406d4 Author: mikejwre Date: 2010-02-18 13:31 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/a9b4fde406d4 Added tag jdk7-b84 for changeset 7cb9388bb1a1 ! .hgtags Changeset: 2ba381560071 Author: dcherepanov Date: 2010-02-12 19:58 +0300 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/2ba381560071 6705345: Enable multiple file selection in AWT FileDialog Reviewed-by: art, anthony, alexp ! src/share/classes/java/awt/FileDialog.java ! src/share/classes/sun/awt/AWTAccessor.java ! src/solaris/classes/sun/awt/X11/XFileDialogPeer.java ! src/windows/classes/sun/awt/windows/WFileDialogPeer.java ! src/windows/native/sun/windows/awt_FileDialog.cpp ! src/windows/native/sun/windows/awt_FileDialog.h + test/java/awt/FileDialog/MultipleMode/MultipleMode.html + test/java/awt/FileDialog/MultipleMode/MultipleMode.java Changeset: d6d2de6ee2d1 Author: lana Date: 2010-02-19 15:13 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/d6d2de6ee2d1 Merge Changeset: b396584a3e64 Author: lana Date: 2010-02-23 10:17 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/b396584a3e64 Merge - make/java/text/FILES_java.gmk Changeset: c2d29e5695c2 Author: lana Date: 2010-03-04 13:40 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/c2d29e5695c2 Merge From lana.steuck at sun.com Fri Mar 5 07:03:30 2010 From: lana.steuck at sun.com (lana.steuck at sun.com) Date: Fri, 05 Mar 2010 07:03:30 +0000 Subject: hg: jdk7/tl/langtools: 3 new changesets Message-ID: <20100305070338.B02E543E79@hg.openjdk.java.net> Changeset: 75d5bd12eb86 Author: mikejwre Date: 2010-02-18 13:31 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/75d5bd12eb86 Added tag jdk7-b84 for changeset d9cd5b8286e4 ! .hgtags Changeset: 136bfc679462 Author: lana Date: 2010-02-23 10:17 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/136bfc679462 Merge Changeset: c55733ceed61 Author: lana Date: 2010-03-04 13:40 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/c55733ceed61 Merge From martinrb at google.com Fri Mar 5 09:04:32 2010 From: martinrb at google.com (Martin Buchholz) Date: Fri, 5 Mar 2010 01:04:32 -0800 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> Message-ID: <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> Hi Kevin, As you've noticed, creating objects within a factor of two of their natural limits is a good way to expose lurking bugs. I'm the one responsible for the algorithm in ArrayList. I'm a bit embarrassed, looking at that code today. We could set the array size to Integer.MAX_VALUE, but then you might hit an independent buglet in hotspot that you cannot allocate an array with Integer.MAX_VALUE elements, but Integer.MAX_VALUE - 5 (or so) works. It occurs to me that increasing the size by 50% is better done by int newCapacity = oldCapacity + (oldCapacity >> 1) + 1; I agree with the plan of setting the capacity to something near MAX_VALUE on overflow, and throw OutOfMemoryError on next resize. These bugs are not known. Chris Hegarty, could you file a bug for us? Martin On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern wrote: > Greetings, > > I've noticed bugs in java.util.ArrayList, java.util.Hashtable and > java.io.ByteArrayOutputStream which arise when the capacities of the data > structures reach a particular threshold.? More below. > > When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE its size > reaches its capacity and an add or an insert operation is invoked, the > capacity is increased by only one element.? Notice that in the following > excerpt from ArrayList.ensureCapacity the new capacity is set to (3/2) * > oldCapacity + 1 unless this value would not suffice to accommodate the > required capacity in which case it is set to the required capacity.? If the > current capacity is at least (2/3)*Integer.MAX_VALUE, then (oldCapacity * > 3)/2 + 1 overflows and resolves to a negative number resulting in the new > capacity being set to the required capacity.? The major consequence of this > is that each subsequent add/insert operation results in a full resize of the > ArrayList causing performance to degrade significantly. > > ??? ??? int newCapacity = (oldCapacity * 3)/2 + 1; > ??? ??? ??? if (newCapacity < minCapacity) > ??? ??? newCapacity = minCapacity; > > Hashtable breaks entirely when the size of its backing array reaches (1/2) * > Integer.MAX_VALUE and a rehash is necessary as is evident from the following > excerpt from rehash.? Notice that rehash will attempt to create an array of > negative size if the size of the backing array reaches (1/2) * > Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves to a > negative number. > > ??? int newCapacity = oldCapacity * 2 + 1; > ??? HashtableEntry newTable[] = new HashtableEntry[newCapacity]; > > When the capacity of the backing array in a ByteArrayOutputStream reaches > (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write > operation is invoked, the capacity of the backing array is increased only by > the required number of elements.? Notice that in the following excerpt from > ByteArrayOutputStream.write(int) the new backing array capacity is set to 2 > * buf.length unless this value would not suffice to accommodate the required > capacity in which case it is set to the required capacity.? If the current > backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1, then > buf.length << 1 overflows and resolves to a negative number resulting in the > new capacity being set to the required capacity.? The major consequence of > this, like with ArrayList, is that each subsequent write operation results > in a full resize of the ByteArrayOutputStream causing performance to degrade > significantly. > > ??? int newcount = count + 1; > ??? if (newcount > buf.length) { > ??????????? buf = Arrays.copyOf(buf, Math.max(buf.length << 1, newcount)); > ??? } > > It is interesting to note that any statements about the amortized time > complexity of add/insert operations, such as the one in the ArrayList > javadoc, are invalidated by the performance related bugs.? One solution to > the above situations is to set the new capacity of the backing array to > Integer.MAX_VALUE when the initial size calculation results in a negative > number during a resize. > > Apologies if these bugs are already known. > > Regards, > > Kevin > From develop4lasu at gmail.com Fri Mar 5 09:47:39 2010 From: develop4lasu at gmail.com (develop4lasu at gmail.com) Date: Fri, 05 Mar 2010 09:47:39 +0000 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> Message-ID: <0015174412fa29216404810a9b22@google.com> Hello, I'm using my own Collections if it's possible so I can add some thoughts: 1. I would decrease default array size to 4/6/8, for me it was few Mb more of free memory ( i suggest testing on application that use at least 300Mb) I would test: initial size: 4 long newCapacity = ((long)oldCapacity) + (oldCapacity >> 1) + 2; initial size: 6 long newCapacity = ((long)oldCapacity) + (oldCapacity >> 1) + 2; initial size: 8 long newCapacity = ((long)oldCapacity) + (oldCapacity >> 1) + 2; initial size: 4 long newCapacity = ((long)oldCapacity) + (oldCapacity >> 2) + 4; and then use: > (int)Math.min(newCapacity, Integer.MAX_VALUE); Would be nice then for: Collections.addAll(...) to ask for proper capacity before adding any elements Greetings. W dniu 05-03-2010 10:04 Martin Buchholz napisa?(a): > Hi Kevin, > As you've noticed, creating objects within a factor of two of > their natural limits is a good way to expose lurking bugs. > I'm the one responsible for the algorithm in ArrayList. > I'ma bit embarrassed, looking at that code today. > We could set the array size to Integer.MAX_VALUE, > but then you might hit an independent buglet in hotspot > that you cannot allocate an array with Integer.MAX_VALUE > elements, but Integer.MAX_VALUE - 5 (or so) works. > It occurs to me that increasing the size by 50% is better done by > int newCapacity = oldCapacity + (oldCapacity >> 1) + 1; > I agree with the plan of setting the capacity to something near > MAX_VALUE on overflow, and throw OutOfMemoryError on next resize. > These bugs are not known. > Chris Hegarty, could you file a bug for us? > Martin > On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern kevin.l.stern at gmail.com> > wrote: > > Greetings, > > > > I've noticed bugs in java.util.ArrayList, java.util.Hashtable and > > java.io.ByteArrayOutputStream which arise when the capacities of the > data > > structures reach a particular threshold. More below. > > > > When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE its > size > > reaches its capacity and an add or an insert operation is invoked, the > > capacity is increased by only one element. Notice that in the following > > excerpt from ArrayList.ensureCapacity the new capacity is set to (3/2) * > > oldCapacity + 1 unless this value would not suffice to accommodate the > > required capacity in which case it is set to the required capacity. If > the > > current capacity is at least (2/3)*Integer.MAX_VALUE, then (oldCapacity > * > > 3)/2 + 1 overflows and resolves to a negative number resulting in the > new > > capacity being set to the required capacity. The major consequence of > this > > is that each subsequent add/insert operation results in a full resize > of the > > ArrayList causing performance to degrade significantly. > > > > int newCapacity = (oldCapacity * 3)/2 + 1; > > if (newCapacity > > newCapacity = minCapacity; > > > > Hashtable breaks entirely when the size of its backing array reaches > (1/2) * > > Integer.MAX_VALUE and a rehash is necessary as is evident from the > following > > excerpt from rehash. Notice that rehash will attempt to create an array > of > > negative size if the size of the backing array reaches (1/2) * > > Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves to a > > negative number. > > > > int newCapacity = oldCapacity * 2 + 1; > > HashtableEntry newTable[] = new HashtableEntry[newCapacity]; > > > > When the capacity of the backing array in a ByteArrayOutputStream > reaches > > (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write > > operation is invoked, the capacity of the backing array is increased > only by > > the required number of elements. Notice that in the following excerpt > from > > ByteArrayOutputStream.write(int) the new backing array capacity is set > to 2 > > * buf.length unless this value would not suffice to accommodate the > required > > capacity in which case it is set to the required capacity. If the > current > > backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1, then > > buf.length > > new capacity being set to the required capacity. The major consequence > of > > this, like with ArrayList, is that each subsequent write operation > results > > in a full resize of the ByteArrayOutputStream causing performance to > degrade > > significantly. > > > > int newcount = count + 1; > > if (newcount > buf.length) { > > buf = Arrays.copyOf(buf, Math.max(buf.length > > } > > > > It is interesting to note that any statements about the amortized time > > complexity of add/insert operations, such as the one in the ArrayList > > javadoc, are invalidated by the performance related bugs. One solution > to > > the above situations is to set the new capacity of the backing array to > > Integer.MAX_VALUE when the initial size calculation results in a > negative > > number during a resize. > > > > Apologies if these bugs are already known. > > > > Regards, > > > > Kevin > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ulf.Zibis at gmx.de Fri Mar 5 10:06:10 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Fri, 05 Mar 2010 11:06:10 +0100 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> Message-ID: <4B90D792.8000207@gmx.de> Am 05.03.2010 10:04, schrieb Martin Buchholz: > Hi Kevin, > > As you've noticed, creating objects within a factor of two of > their natural limits is a good way to expose lurking bugs. > > I'm the one responsible for the algorithm in ArrayList. > I'm a bit embarrassed, looking at that code today. > We could set the array size to Integer.MAX_VALUE, > but then you might hit an independent buglet in hotspot > that you cannot allocate an array with Integer.MAX_VALUE > elements, but Integer.MAX_VALUE - 5 (or so) works. > I think, using a max size of Integer.MAX_VALUE - x looks awful, in particular if it's badly commented in the sources. I suggest to introduce something like System.MAX_COLLECTION_SIZE/CAPACITY or .maxCollectionSize/Capacity(). -Ulf From kevin.l.stern at gmail.com Fri Mar 5 10:39:10 2010 From: kevin.l.stern at gmail.com (Kevin L. Stern) Date: Fri, 5 Mar 2010 04:39:10 -0600 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <4B90D792.8000207@gmx.de> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <4B90D792.8000207@gmx.de> Message-ID: <1704b7a21003050239i33b66297kf8d03e25836ee462@mail.gmail.com> FYI, HashMap independently defines a MAXIMUM_CAPACITY variable; it might be a good idea to retrofit this and other such local definitions with any system wide variables that are defined. /** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two <= 1<<30. */ static final int MAXIMUM_CAPACITY = 1 << 30; Regards, Kevin On Fri, Mar 5, 2010 at 4:06 AM, Ulf Zibis wrote: > Am 05.03.2010 10:04, schrieb Martin Buchholz: > > Hi Kevin, >> >> As you've noticed, creating objects within a factor of two of >> their natural limits is a good way to expose lurking bugs. >> >> I'm the one responsible for the algorithm in ArrayList. >> I'm a bit embarrassed, looking at that code today. >> We could set the array size to Integer.MAX_VALUE, >> but then you might hit an independent buglet in hotspot >> that you cannot allocate an array with Integer.MAX_VALUE >> elements, but Integer.MAX_VALUE - 5 (or so) works. >> >> > > I think, using a max size of Integer.MAX_VALUE - x looks awful, in > particular if it's badly commented in the sources. > I suggest to introduce something like System.MAX_COLLECTION_SIZE/CAPACITY > or .maxCollectionSize/Capacity(). > > -Ulf > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.l.stern at gmail.com Fri Mar 5 10:48:59 2010 From: kevin.l.stern at gmail.com (Kevin L. Stern) Date: Fri, 5 Mar 2010 04:48:59 -0600 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> Message-ID: <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> Hi Martin, Thank you for your reply. If I may, PriorityQueue appears to employ the simple strategy that I suggested above in its grow method: int newCapacity = ((oldCapacity < 64)? ((oldCapacity + 1) * 2): ((oldCapacity / 2) * 3)); if (newCapacity < 0) // overflow newCapacity = Integer.MAX_VALUE; It might be desirable to set a common strategy for capacity increase for all collections. Regards, Kevin On Fri, Mar 5, 2010 at 3:04 AM, Martin Buchholz wrote: > Hi Kevin, > > As you've noticed, creating objects within a factor of two of > their natural limits is a good way to expose lurking bugs. > > I'm the one responsible for the algorithm in ArrayList. > I'm a bit embarrassed, looking at that code today. > We could set the array size to Integer.MAX_VALUE, > but then you might hit an independent buglet in hotspot > that you cannot allocate an array with Integer.MAX_VALUE > elements, but Integer.MAX_VALUE - 5 (or so) works. > > It occurs to me that increasing the size by 50% is better done by > int newCapacity = oldCapacity + (oldCapacity >> 1) + 1; > > I agree with the plan of setting the capacity to something near > MAX_VALUE on overflow, and throw OutOfMemoryError on next resize. > > These bugs are not known. > Chris Hegarty, could you file a bug for us? > > Martin > > On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern > wrote: > > Greetings, > > > > I've noticed bugs in java.util.ArrayList, java.util.Hashtable and > > java.io.ByteArrayOutputStream which arise when the capacities of the data > > structures reach a particular threshold. More below. > > > > When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE its > size > > reaches its capacity and an add or an insert operation is invoked, the > > capacity is increased by only one element. Notice that in the following > > excerpt from ArrayList.ensureCapacity the new capacity is set to (3/2) * > > oldCapacity + 1 unless this value would not suffice to accommodate the > > required capacity in which case it is set to the required capacity. If > the > > current capacity is at least (2/3)*Integer.MAX_VALUE, then (oldCapacity * > > 3)/2 + 1 overflows and resolves to a negative number resulting in the new > > capacity being set to the required capacity. The major consequence of > this > > is that each subsequent add/insert operation results in a full resize of > the > > ArrayList causing performance to degrade significantly. > > > > int newCapacity = (oldCapacity * 3)/2 + 1; > > if (newCapacity < minCapacity) > > newCapacity = minCapacity; > > > > Hashtable breaks entirely when the size of its backing array reaches > (1/2) * > > Integer.MAX_VALUE and a rehash is necessary as is evident from the > following > > excerpt from rehash. Notice that rehash will attempt to create an array > of > > negative size if the size of the backing array reaches (1/2) * > > Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves to a > > negative number. > > > > int newCapacity = oldCapacity * 2 + 1; > > HashtableEntry newTable[] = new HashtableEntry[newCapacity]; > > > > When the capacity of the backing array in a ByteArrayOutputStream reaches > > (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write > > operation is invoked, the capacity of the backing array is increased only > by > > the required number of elements. Notice that in the following excerpt > from > > ByteArrayOutputStream.write(int) the new backing array capacity is set to > 2 > > * buf.length unless this value would not suffice to accommodate the > required > > capacity in which case it is set to the required capacity. If the > current > > backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1, then > > buf.length << 1 overflows and resolves to a negative number resulting in > the > > new capacity being set to the required capacity. The major consequence > of > > this, like with ArrayList, is that each subsequent write operation > results > > in a full resize of the ByteArrayOutputStream causing performance to > degrade > > significantly. > > > > int newcount = count + 1; > > if (newcount > buf.length) { > > buf = Arrays.copyOf(buf, Math.max(buf.length << 1, > newcount)); > > } > > > > It is interesting to note that any statements about the amortized time > > complexity of add/insert operations, such as the one in the ArrayList > > javadoc, are invalidated by the performance related bugs. One solution > to > > the above situations is to set the new capacity of the backing array to > > Integer.MAX_VALUE when the initial size calculation results in a negative > > number during a resize. > > > > Apologies if these bugs are already known. > > > > Regards, > > > > Kevin > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.gibbons at sun.com Sat Mar 6 00:14:47 2010 From: jonathan.gibbons at sun.com (jonathan.gibbons at sun.com) Date: Sat, 06 Mar 2010 00:14:47 +0000 Subject: hg: jdk7/tl/langtools: 2 new changesets Message-ID: <20100306001501.DFE7143F6E@hg.openjdk.java.net> Changeset: a23282f17d0b Author: jjg Date: 2010-03-05 16:12 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/a23282f17d0b 6930108: IllegalArgumentException in AbstractDiagnosticFormatter for tools/javac/api/TestJavacTaskScanner.jav Reviewed-by: darcy ! src/share/classes/com/sun/tools/javac/util/BasicDiagnosticFormatter.java ! test/tools/javac/api/TestJavacTaskScanner.java + test/tools/javac/api/TestResolveError.java Changeset: a4f3b97c8028 Author: jjg Date: 2010-03-05 16:13 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/a4f3b97c8028 Merge From Kelly.Ohair at Sun.COM Sat Mar 6 17:18:17 2010 From: Kelly.Ohair at Sun.COM (Kelly O'Hair) Date: Sat, 06 Mar 2010 09:18:17 -0800 Subject: Test failure java/nio/channels/Selector/OpRead.java Message-ID: <4B928E59.1070005@sun.com> Just to record the event... TEST: java/nio/channels/Selector/OpRead.java Failed on Fedora 9 32bit machine prt-x2200-1.sfbay, NOT using -samevm. I'll file a bug if it repeats, or you ask for one to be filed. -kto -------------------------------------------------- TEST: java/nio/channels/Selector/OpRead.java JDK under test: (/tmp/jprt/P1/T/060322.ohair/testproduct/linux_i586_2.6-product) java version "1.7.0-2010-03-06-060322.ohair.jdk" Java(TM) SE Runtime Environment (build 1.7.0-2010-03-06-060322.ohair.jdk-jprtadm_2010_03_05_22_07-b00) Java HotSpot(TM) Server VM (build 17.0-b10, mixed mode) ACTION: build -- Passed. Build successful REASON: Named class compiled on demand TIME: 0.702 seconds messages: command: build OpRead reason: Named class compiled on demand elapsed time (seconds): 0.702 ACTION: compile -- Passed. Compilation successful REASON: .class file out of date or does not exist TIME: 0.702 seconds messages: command: compile /tmp/jprt/P1/T/060322.ohair/source/test/java/nio/channels/Selector/OpRead.java reason: .class file out of date or does not exist elapsed time (seconds): 0.702 STDOUT: STDERR: ACTION: main -- Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Test failed REASON: Assumed action based on file name: run main OpRead TIME: 1.146 seconds messages: command: main OpRead reason: Assumed action based on file name: run main OpRead elapsed time (seconds): 1.146 STDOUT: STDERR: java.lang.RuntimeException: Test failed at OpRead.test(OpRead.java:68) at OpRead.main(OpRead.java:83) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:613) at com.sun.javatest.regtest.MainWrapper$MainThread.run(MainWrapper.java:94) at java.lang.Thread.run(Thread.java:717) JavaTest Message: Test threw exception: java.lang.RuntimeException: Test failed JavaTest Message: shutting down test STATUS:Failed.`main' threw exception: java.lang.RuntimeException: Test failed TEST RESULT: Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Test failed -------------------------------------------------- From Alan.Bateman at Sun.COM Sat Mar 6 17:32:52 2010 From: Alan.Bateman at Sun.COM (Alan Bateman) Date: Sat, 06 Mar 2010 17:32:52 +0000 Subject: Test failure java/nio/channels/Selector/OpRead.java In-Reply-To: <4B928E59.1070005@sun.com> References: <4B928E59.1070005@sun.com> Message-ID: <4B9291C4.40205@sun.com> Kelly O'Hair wrote: > > Just to record the event... > > TEST: java/nio/channels/Selector/OpRead.java > > Failed on Fedora 9 32bit machine prt-x2200-1.sfbay, NOT using -samevm. > > I'll file a bug if it repeats, or you ask for one to be filed. > > -kto Looking at it now, the test has a timing issue and I'm surprised we haven't seen this failure before. So yes, please create a bug. -Alan. From Ulf.Zibis at gmx.de Sat Mar 6 21:00:19 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sat, 06 Mar 2010 22:00:19 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4B8EB46C.1010208@sun.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> Message-ID: <4B92C263.9020404@gmx.de> Very fast Sherman, much thanks. Could you set the bug to accepted and evaluated, so my patch will have a chance to get into the code base? -Ulf Am 03.03.2010 20:11, schrieb Xueming Shen: > #6931812 > > Martin Buchholz wrote: >> Sherman, would you like to file bugs for Ulf's improvements? >> >> On Wed, Mar 3, 2010 at 02:44, Ulf Zibis wrote: >>> Am 03.03.2010 09:00, schrieb Martin Buchholz: >> >>>> Keep in mind that supplementary characters are extremely rare. >>>> >>> Yes, but many API's in the JDK are used rarely. >>> Why should they waste memory footprint / perform bad, particularly >>> if it >>> doesn't cost anything. >> >> I admire your perfectionism. >> >>>> Therefore the existing implementation >>>> >>>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>> && codePoint<= MAX_CODE_POINT; >>>> >>>> will almost always perform just one comparison against a constant, >>>> which is hard to beat. >>>> >>> 1. Wondering: I think there are TWO comparisons. >>> 2. Those comparisons need to load 32 bit values from machine code, >>> against >>> only 8 bit values in my case. >> >> It's a good point. In the machine code, shifts are likely to use >> immediate values, and so will be a small win. >> >> int x = codePoint >>> 16; >> return x != 0 && x < 0x11; >> >> (On modern hardware, these optimizations >> are less valuable than they used to be; >> ordinary integer arithmetic is almost free) >> >> Martin > > From kelly.ohair at sun.com Sat Mar 6 23:00:11 2010 From: kelly.ohair at sun.com (kelly.ohair at sun.com) Date: Sat, 06 Mar 2010 23:00:11 +0000 Subject: hg: jdk7/tl/jdk: 6915983: testing problems, adjusting list of tests, needs some investigation Message-ID: <20100306230106.5B8CA440AE@hg.openjdk.java.net> Changeset: 58b44ac0b10d Author: ohair Date: 2010-03-06 14:59 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/58b44ac0b10d 6915983: testing problems, adjusting list of tests, needs some investigation Reviewed-by: alanb ! test/Makefile ! test/ProblemList.txt From kelly.ohair at sun.com Sat Mar 6 23:01:34 2010 From: kelly.ohair at sun.com (kelly.ohair at sun.com) Date: Sat, 06 Mar 2010 23:01:34 +0000 Subject: hg: jdk7/tl: 2 new changesets Message-ID: <20100306230134.7DB32440AF@hg.openjdk.java.net> Changeset: 4d7419e4b759 Author: ohair Date: 2010-03-06 15:00 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/rev/4d7419e4b759 6928700: Configure top repo for JPRT testing Reviewed-by: alanb, jjg ! make/jprt.properties + test/Makefile Changeset: f3664d6879ab Author: ohair Date: 2010-03-06 15:01 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/rev/f3664d6879ab Merge From martinrb at google.com Tue Mar 9 02:10:37 2010 From: martinrb at google.com (Martin Buchholz) Date: Mon, 8 Mar 2010 18:10:37 -0800 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> Message-ID: <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> [Chris or Alan, please review and file a bug] OK, guys, Here's a patch: http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/ Martin On Fri, Mar 5, 2010 at 02:48, Kevin L. Stern wrote: > Hi Martin, > > Thank you for your reply.? If I may, PriorityQueue appears to employ the > simple strategy that I suggested above in its grow method: > > ??????? int newCapacity = ((oldCapacity < 64)? > ?????????????????????????? ((oldCapacity + 1) * 2): > ?????????????????????????? ((oldCapacity / 2) * 3)); > ??????? if (newCapacity < 0) // overflow > ??????????? newCapacity = Integer.MAX_VALUE; > > It might be desirable to set a common strategy for capacity increase for all > collections. > > Regards, > > Kevin > > On Fri, Mar 5, 2010 at 3:04 AM, Martin Buchholz wrote: >> >> Hi Kevin, >> >> As you've noticed, creating objects within a factor of two of >> their natural limits is a good way to expose lurking bugs. >> >> I'm the one responsible for the algorithm in ArrayList. >> I'm a bit embarrassed, looking at that code today. >> We could set the array size to Integer.MAX_VALUE, >> but then you might hit an independent buglet in hotspot >> that you cannot allocate an array with Integer.MAX_VALUE >> elements, but Integer.MAX_VALUE - 5 (or so) works. >> >> It occurs to me that increasing the size by 50% is better done by >> int newCapacity = oldCapacity + (oldCapacity >> 1) + 1; >> >> I agree with the plan of setting the capacity to something near >> MAX_VALUE on overflow, and throw OutOfMemoryError on next resize. >> >> These bugs are not known. >> Chris Hegarty, could you file a bug for us? >> >> Martin >> >> On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern >> wrote: >> > Greetings, >> > >> > I've noticed bugs in java.util.ArrayList, java.util.Hashtable and >> > java.io.ByteArrayOutputStream which arise when the capacities of the >> > data >> > structures reach a particular threshold.? More below. >> > >> > When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE its >> > size >> > reaches its capacity and an add or an insert operation is invoked, the >> > capacity is increased by only one element.? Notice that in the following >> > excerpt from ArrayList.ensureCapacity the new capacity is set to (3/2) * >> > oldCapacity + 1 unless this value would not suffice to accommodate the >> > required capacity in which case it is set to the required capacity.? If >> > the >> > current capacity is at least (2/3)*Integer.MAX_VALUE, then (oldCapacity >> > * >> > 3)/2 + 1 overflows and resolves to a negative number resulting in the >> > new >> > capacity being set to the required capacity.? The major consequence of >> > this >> > is that each subsequent add/insert operation results in a full resize of >> > the >> > ArrayList causing performance to degrade significantly. >> > >> > ??? ??? int newCapacity = (oldCapacity * 3)/2 + 1; >> > ??? ??? ??? if (newCapacity < minCapacity) >> > ??? ??? newCapacity = minCapacity; >> > >> > Hashtable breaks entirely when the size of its backing array reaches >> > (1/2) * >> > Integer.MAX_VALUE and a rehash is necessary as is evident from the >> > following >> > excerpt from rehash.? Notice that rehash will attempt to create an array >> > of >> > negative size if the size of the backing array reaches (1/2) * >> > Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves to a >> > negative number. >> > >> > ??? int newCapacity = oldCapacity * 2 + 1; >> > ??? HashtableEntry newTable[] = new HashtableEntry[newCapacity]; >> > >> > When the capacity of the backing array in a ByteArrayOutputStream >> > reaches >> > (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write >> > operation is invoked, the capacity of the backing array is increased >> > only by >> > the required number of elements.? Notice that in the following excerpt >> > from >> > ByteArrayOutputStream.write(int) the new backing array capacity is set >> > to 2 >> > * buf.length unless this value would not suffice to accommodate the >> > required >> > capacity in which case it is set to the required capacity.? If the >> > current >> > backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1, then >> > buf.length << 1 overflows and resolves to a negative number resulting in >> > the >> > new capacity being set to the required capacity.? The major consequence >> > of >> > this, like with ArrayList, is that each subsequent write operation >> > results >> > in a full resize of the ByteArrayOutputStream causing performance to >> > degrade >> > significantly. >> > >> > ??? int newcount = count + 1; >> > ??? if (newcount > buf.length) { >> > ??????????? buf = Arrays.copyOf(buf, Math.max(buf.length << 1, >> > newcount)); >> > ??? } >> > >> > It is interesting to note that any statements about the amortized time >> > complexity of add/insert operations, such as the one in the ArrayList >> > javadoc, are invalidated by the performance related bugs.? One solution >> > to >> > the above situations is to set the new capacity of the backing array to >> > Integer.MAX_VALUE when the initial size calculation results in a >> > negative >> > number during a resize. >> > >> > Apologies if these bugs are already known. >> > >> > Regards, >> > >> > Kevin >> > > > From martinrb at google.com Tue Mar 9 02:13:38 2010 From: martinrb at google.com (Martin Buchholz) Date: Mon, 8 Mar 2010 18:13:38 -0800 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> Message-ID: <1ccfd1c11003081813o4ea436d2o2414182160e20d76@mail.gmail.com> On Fri, Mar 5, 2010 at 02:48, Kevin L. Stern wrote: > Hi Martin, > > Thank you for your reply.? If I may, PriorityQueue appears to employ the > simple strategy that I suggested above in its grow method: > > ??????? int newCapacity = ((oldCapacity < 64)? > ?????????????????????????? ((oldCapacity + 1) * 2): > ?????????????????????????? ((oldCapacity / 2) * 3)); > ??????? if (newCapacity < 0) // overflow > ??????????? newCapacity = Integer.MAX_VALUE; > > It might be desirable to set a common strategy for capacity increase for all > collections. The PriorityQueue implementation is better than always doubling, but not better enough to change the expansion policy of existing heavily used collection classes. Martin From martinrb at google.com Tue Mar 9 02:20:08 2010 From: martinrb at google.com (Martin Buchholz) Date: Mon, 8 Mar 2010 18:20:08 -0800 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <0015174412fa29216404810a9b22@google.com> References: <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <0015174412fa29216404810a9b22@google.com> Message-ID: <1ccfd1c11003081820o712ec40bk8f5ba446282cc5aa@mail.gmail.com> 2010/3/5 : > Hello, > > I'm using my own Collections if it's possible so I can add some thoughts: > > 1. I would decrease default array size to 4/6/8, for me it was few Mb more > of free memory ( i suggest testing on application that use at least 300Mb) > > I would test: > > initial size: 4 > long newCapacity = ((long)oldCapacity) + (oldCapacity >> 1) + 2; > > initial size: 6 > long newCapacity = ((long)oldCapacity) + (oldCapacity >> 1) + 2; > > initial size: 8 > long newCapacity = ((long)oldCapacity) + (oldCapacity >> 1) + 2; > > initial size: 4 > long newCapacity = ((long)oldCapacity) + (oldCapacity >> 2) + 4; I agree that smaller initial sizes would be better, (and better yet would be to eventually shrink sizes of arrays!) but it's very hard to change the default behavior of classes in the JDK. Java benchmarks typically do not test memory-constrained environments, so the JDK usually optimizes for time over space. This is the kind of optimization that might better go into the less conservative IcedTea fork. > and then use: >> (int)Math.min(newCapacity, Integer.MAX_VALUE); The above expression always yields newCapacity. Martin From martinrb at google.com Tue Mar 9 02:37:52 2010 From: martinrb at google.com (Martin Buchholz) Date: Mon, 8 Mar 2010 18:37:52 -0800 Subject: Support for PARTIAL_FLUSH in Deflater Message-ID: <1ccfd1c11003081837i41e7640bvf239ad351d65c887@mail.gmail.com> Hi FlaterMouses, We added support for various "flush modes" to Deflater, but we did not include support for PARTIAL_FLUSH. Because not even zlib.h is enthusiastic about PARTIAL_FLUSH: #define Z_PARTIAL_FLUSH 1 /* will be removed, use Z_SYNC_FLUSH instead */ But it sure looks like Z_PARTIAL_FLUSH will never actually be removed. It's been a few years, and PARTIAL_FLUSH is actively used by SSH implementations, as vaguely specified in RFC 4253. Costin argued for PARTIAL_FLUSH elsewhere: """ The jzlib library was written exactly for this reason - it was not possible to implement SSL using deflater, and with this change it still isn't possible. The following compression methods are currently defined: none REQUIRED no compression zlib OPTIONAL ZLIB (LZ77) compression The "zlib" compression is described in [RFC1950] and in [RFC1951]. The compression context is initialized after each key exchange, and is passed from one packet to the next, with only a partial flush being performed at the end of each packet. A partial flush means that the current compressed block is ended and all data will be output. If the current block is not a stored block, one or more empty blocks are added after the current block to ensure that there are at least 8 bits, counting from the start of the end-of-block code of the current block to the end of the packet payload. So anyone implementing SSH with compression will have to use introspection or alternate compression library. """ Since adding the missing support seems easier than arguing about it, I suggest we Just Do It. Martin From dmytro_sheyko at hotmail.com Tue Mar 9 10:19:08 2010 From: dmytro_sheyko at hotmail.com (Dmytro Sheyko) Date: Tue, 9 Mar 2010 17:19:08 +0700 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com>, <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com>, <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com>, <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> Message-ID: Is there any reason to use comparison like this if (newCapacity - minCapacity < 0) if (newCapacity - MAX_ARRAY_SIZE > 0) { instead of if (newCapacity < minCapacity) if (newCapacity > MAX_ARRAY_SIZE) { Thanks, Dmytro > Date: Mon, 8 Mar 2010 18:10:37 -0800 > Subject: Re: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream > From: martinrb at google.com > To: kevin.l.stern at gmail.com; christopher.hegarty at sun.com; alan.bateman at sun.com > CC: core-libs-dev at openjdk.java.net > > [Chris or Alan, please review and file a bug] > > OK, guys, > > Here's a patch: > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/ > > Martin _________________________________________________________________ Hotmail: Trusted email with powerful SPAM protection. https://signup.live.com/signup.aspx?id=60969 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.l.stern at gmail.com Tue Mar 9 10:38:17 2010 From: kevin.l.stern at gmail.com (Kevin L. Stern) Date: Tue, 9 Mar 2010 04:38:17 -0600 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> Message-ID: <1704b7a21003090238j74fcf04fs54e4d1aedbfa8a20@mail.gmail.com> These comparisons are essential to the working of Martin's algorithm. I found them interesting as well, but notice that when the capacity overflows these comparisons will always be false. That is to say: oldCapacity < minCapacity (given, otherwise we would not be resizing) therefore oldCapacity + (0.5 for ArrayList, else 1) * oldCapacity - minCapacity < oldCapacity So if oldCapacity + (0.5 for ArrayList, else 1) * oldCapacity > Integer.MAX_VALUE, subtracting minCapacity re-overflows back into the positive number realm. That being said, and this is a question/comment to all, I want to point out that this type of code assumes a particular class of orderly overflow behavior. Is this specified in the Java spec, or will this break on an obscure machine that does not use, say, two's complement arithmetic? Regards, Kevin 2010/3/9 Dmytro Sheyko > Is there any reason to use comparison like this > > if (newCapacity - minCapacity < 0) > > if (newCapacity - MAX_ARRAY_SIZE > 0) { > > instead of > > if (newCapacity < minCapacity) > > if (newCapacity > MAX_ARRAY_SIZE) { > > Thanks, > Dmytro > > > Date: Mon, 8 Mar 2010 18:10:37 -0800 > > Subject: Re: Bugs in java.util.ArrayList, java.util.Hashtable and > java.io.ByteArrayOutputStream > > From: martinrb at google.com > > To: kevin.l.stern at gmail.com; christopher.hegarty at sun.com; > alan.bateman at sun.com > > CC: core-libs-dev at openjdk.java.net > > > > > [Chris or Alan, please review and file a bug] > > > > OK, guys, > > > > Here's a patch: > > > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/ > > > > Martin > > > ------------------------------ > Hotmail: Trusted email with powerful SPAM protection. Sign up now. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.l.stern at gmail.com Tue Mar 9 11:02:21 2010 From: kevin.l.stern at gmail.com (Kevin L. Stern) Date: Tue, 9 Mar 2010 05:02:21 -0600 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1704b7a21003090238j74fcf04fs54e4d1aedbfa8a20@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> <1704b7a21003090238j74fcf04fs54e4d1aedbfa8a20@mail.gmail.com> Message-ID: <1704b7a21003090302p5bd875f3wbc7bf131296cb8dd@mail.gmail.com> I did a quick search and it appears that Java is indeed two's complement based. Nonetheless, please allow me to point out that, in general, this type of code worries me since I fully expect that at some point someone will come along and do exactly what Dmytro suggested; that is, someone will change: if (a - b > 0) to if (a > b) and the entire ship will sink. I, personally, like to avoid obscurities such as making integer overflow an essential basis for my algorithm unless there is a good reason to do so. I would, in general, prefer to avoid overflow altogether and to make the overflow scenario more explicit: if (oldCapacity > RESIZE_OVERFLOW_THRESHOLD) { // do something } else { // do something else } Of course, these are simply my coding preferences and I may very well be missing the 'good reason' to take the current approach. Regards, Kevin On Tue, Mar 9, 2010 at 4:38 AM, Kevin L. Stern wrote: > These comparisons are essential to the working of Martin's algorithm. I > found them interesting as well, but notice that when the capacity overflows > these comparisons will always be false. That is to say: > > oldCapacity < minCapacity (given, otherwise we would not be resizing) > therefore oldCapacity + (0.5 for ArrayList, else 1) * oldCapacity - > minCapacity < oldCapacity > > So if oldCapacity + (0.5 for ArrayList, else 1) * oldCapacity > > Integer.MAX_VALUE, subtracting minCapacity re-overflows back into the > positive number realm. > > That being said, and this is a question/comment to all, I want to point out > that this type of code assumes a particular class of orderly overflow > behavior. Is this specified in the Java spec, or will this break on an > obscure machine that does not use, say, two's complement arithmetic? > > Regards, > > Kevin > > 2010/3/9 Dmytro Sheyko > > Is there any reason to use comparison like this >> >> if (newCapacity - minCapacity < 0) >> >> if (newCapacity - MAX_ARRAY_SIZE > 0) { >> >> instead of >> >> if (newCapacity < minCapacity) >> >> if (newCapacity > MAX_ARRAY_SIZE) { >> >> Thanks, >> Dmytro >> >> > Date: Mon, 8 Mar 2010 18:10:37 -0800 >> > Subject: Re: Bugs in java.util.ArrayList, java.util.Hashtable and >> java.io.ByteArrayOutputStream >> > From: martinrb at google.com >> > To: kevin.l.stern at gmail.com; christopher.hegarty at sun.com; >> alan.bateman at sun.com >> > CC: core-libs-dev at openjdk.java.net >> >> > >> > [Chris or Alan, please review and file a bug] >> > >> > OK, guys, >> > >> > Here's a patch: >> > >> > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/ >> > >> > Martin >> >> >> ------------------------------ >> Hotmail: Trusted email with powerful SPAM protection. Sign up now. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Christopher.Hegarty at Sun.COM Tue Mar 9 11:41:30 2010 From: Christopher.Hegarty at Sun.COM (Christopher Hegarty -Sun Microsystems Ireland) Date: Tue, 09 Mar 2010 11:41:30 +0000 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> Message-ID: <4B9633EA.8070101@sun.com> Sorry Martin, I appear to have missed your original request to file this bug. I since filed the following: 6933217: Huge arrays handled poorly in core libraries The changes you are proposing seem reasonable to me. -Chris. Martin Buchholz wrote: > [Chris or Alan, please review and file a bug] > > OK, guys, > > Here's a patch: > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/ > > Martin > > On Fri, Mar 5, 2010 at 02:48, Kevin L. Stern wrote: >> Hi Martin, >> >> Thank you for your reply. If I may, PriorityQueue appears to employ the >> simple strategy that I suggested above in its grow method: >> >> int newCapacity = ((oldCapacity < 64)? >> ((oldCapacity + 1) * 2): >> ((oldCapacity / 2) * 3)); >> if (newCapacity < 0) // overflow >> newCapacity = Integer.MAX_VALUE; >> >> It might be desirable to set a common strategy for capacity increase for all >> collections. >> >> Regards, >> >> Kevin >> >> On Fri, Mar 5, 2010 at 3:04 AM, Martin Buchholz wrote: >>> Hi Kevin, >>> >>> As you've noticed, creating objects within a factor of two of >>> their natural limits is a good way to expose lurking bugs. >>> >>> I'm the one responsible for the algorithm in ArrayList. >>> I'm a bit embarrassed, looking at that code today. >>> We could set the array size to Integer.MAX_VALUE, >>> but then you might hit an independent buglet in hotspot >>> that you cannot allocate an array with Integer.MAX_VALUE >>> elements, but Integer.MAX_VALUE - 5 (or so) works. >>> >>> It occurs to me that increasing the size by 50% is better done by >>> int newCapacity = oldCapacity + (oldCapacity >> 1) + 1; >>> >>> I agree with the plan of setting the capacity to something near >>> MAX_VALUE on overflow, and throw OutOfMemoryError on next resize. >>> >>> These bugs are not known. >>> Chris Hegarty, could you file a bug for us? >>> >>> Martin >>> >>> On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern >>> wrote: >>>> Greetings, >>>> >>>> I've noticed bugs in java.util.ArrayList, java.util.Hashtable and >>>> java.io.ByteArrayOutputStream which arise when the capacities of the >>>> data >>>> structures reach a particular threshold. More below. >>>> >>>> When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE its >>>> size >>>> reaches its capacity and an add or an insert operation is invoked, the >>>> capacity is increased by only one element. Notice that in the following >>>> excerpt from ArrayList.ensureCapacity the new capacity is set to (3/2) * >>>> oldCapacity + 1 unless this value would not suffice to accommodate the >>>> required capacity in which case it is set to the required capacity. If >>>> the >>>> current capacity is at least (2/3)*Integer.MAX_VALUE, then (oldCapacity >>>> * >>>> 3)/2 + 1 overflows and resolves to a negative number resulting in the >>>> new >>>> capacity being set to the required capacity. The major consequence of >>>> this >>>> is that each subsequent add/insert operation results in a full resize of >>>> the >>>> ArrayList causing performance to degrade significantly. >>>> >>>> int newCapacity = (oldCapacity * 3)/2 + 1; >>>> if (newCapacity < minCapacity) >>>> newCapacity = minCapacity; >>>> >>>> Hashtable breaks entirely when the size of its backing array reaches >>>> (1/2) * >>>> Integer.MAX_VALUE and a rehash is necessary as is evident from the >>>> following >>>> excerpt from rehash. Notice that rehash will attempt to create an array >>>> of >>>> negative size if the size of the backing array reaches (1/2) * >>>> Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves to a >>>> negative number. >>>> >>>> int newCapacity = oldCapacity * 2 + 1; >>>> HashtableEntry newTable[] = new HashtableEntry[newCapacity]; >>>> >>>> When the capacity of the backing array in a ByteArrayOutputStream >>>> reaches >>>> (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write >>>> operation is invoked, the capacity of the backing array is increased >>>> only by >>>> the required number of elements. Notice that in the following excerpt >>>> from >>>> ByteArrayOutputStream.write(int) the new backing array capacity is set >>>> to 2 >>>> * buf.length unless this value would not suffice to accommodate the >>>> required >>>> capacity in which case it is set to the required capacity. If the >>>> current >>>> backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1, then >>>> buf.length << 1 overflows and resolves to a negative number resulting in >>>> the >>>> new capacity being set to the required capacity. The major consequence >>>> of >>>> this, like with ArrayList, is that each subsequent write operation >>>> results >>>> in a full resize of the ByteArrayOutputStream causing performance to >>>> degrade >>>> significantly. >>>> >>>> int newcount = count + 1; >>>> if (newcount > buf.length) { >>>> buf = Arrays.copyOf(buf, Math.max(buf.length << 1, >>>> newcount)); >>>> } >>>> >>>> It is interesting to note that any statements about the amortized time >>>> complexity of add/insert operations, such as the one in the ArrayList >>>> javadoc, are invalidated by the performance related bugs. One solution >>>> to >>>> the above situations is to set the new capacity of the backing array to >>>> Integer.MAX_VALUE when the initial size calculation results in a >>>> negative >>>> number during a resize. >>>> >>>> Apologies if these bugs are already known. >>>> >>>> Regards, >>>> >>>> Kevin >>>> >> From Ulf.Zibis at gmx.de Tue Mar 9 11:59:26 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 09 Mar 2010 12:59:26 +0100 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> Message-ID: <4B96381E.6030902@gmx.de> In PriorityQueue: let's result newCapacity in 0xFFFF.FFFC =-4 then "if (newCapacity - MAX_ARRAY_SIZE > 0)" ---> false then Arrays.copyOf(queue, newCapacity) ---> ArrayIndexOutOfBoundsException Am I wrong ? 2.) Why don't you prefer a system-wide constant for MAX_ARRAY_SIZE ??? -Ulf Am 09.03.2010 03:10, schrieb Martin Buchholz: > [Chris or Alan, please review and file a bug] > > OK, guys, > > Here's a patch: > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/ > > Martin > > On Fri, Mar 5, 2010 at 02:48, Kevin L. Stern wrote: > >> Hi Martin, >> >> Thank you for your reply. If I may, PriorityQueue appears to employ the >> simple strategy that I suggested above in its grow method: >> >> int newCapacity = ((oldCapacity< 64)? >> ((oldCapacity + 1) * 2): >> ((oldCapacity / 2) * 3)); >> if (newCapacity< 0) // overflow >> newCapacity = Integer.MAX_VALUE; >> >> It might be desirable to set a common strategy for capacity increase for all >> collections. >> >> Regards, >> >> Kevin >> >> On Fri, Mar 5, 2010 at 3:04 AM, Martin Buchholz wrote: >> >>> Hi Kevin, >>> >>> As you've noticed, creating objects within a factor of two of >>> their natural limits is a good way to expose lurking bugs. >>> >>> I'm the one responsible for the algorithm in ArrayList. >>> I'm a bit embarrassed, looking at that code today. >>> We could set the array size to Integer.MAX_VALUE, >>> but then you might hit an independent buglet in hotspot >>> that you cannot allocate an array with Integer.MAX_VALUE >>> elements, but Integer.MAX_VALUE - 5 (or so) works. >>> >>> It occurs to me that increasing the size by 50% is better done by >>> int newCapacity = oldCapacity + (oldCapacity>> 1) + 1; >>> >>> I agree with the plan of setting the capacity to something near >>> MAX_VALUE on overflow, and throw OutOfMemoryError on next resize. >>> >>> These bugs are not known. >>> Chris Hegarty, could you file a bug for us? >>> >>> Martin >>> >>> On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern >>> wrote: >>> >>>> Greetings, >>>> >>>> I've noticed bugs in java.util.ArrayList, java.util.Hashtable and >>>> java.io.ByteArrayOutputStream which arise when the capacities of the >>>> data >>>> structures reach a particular threshold. More below. >>>> >>>> When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE its >>>> size >>>> reaches its capacity and an add or an insert operation is invoked, the >>>> capacity is increased by only one element. Notice that in the following >>>> excerpt from ArrayList.ensureCapacity the new capacity is set to (3/2) * >>>> oldCapacity + 1 unless this value would not suffice to accommodate the >>>> required capacity in which case it is set to the required capacity. If >>>> the >>>> current capacity is at least (2/3)*Integer.MAX_VALUE, then (oldCapacity >>>> * >>>> 3)/2 + 1 overflows and resolves to a negative number resulting in the >>>> new >>>> capacity being set to the required capacity. The major consequence of >>>> this >>>> is that each subsequent add/insert operation results in a full resize of >>>> the >>>> ArrayList causing performance to degrade significantly. >>>> >>>> int newCapacity = (oldCapacity * 3)/2 + 1; >>>> if (newCapacity< minCapacity) >>>> newCapacity = minCapacity; >>>> >>>> Hashtable breaks entirely when the size of its backing array reaches >>>> (1/2) * >>>> Integer.MAX_VALUE and a rehash is necessary as is evident from the >>>> following >>>> excerpt from rehash. Notice that rehash will attempt to create an array >>>> of >>>> negative size if the size of the backing array reaches (1/2) * >>>> Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves to a >>>> negative number. >>>> >>>> int newCapacity = oldCapacity * 2 + 1; >>>> HashtableEntry newTable[] = new HashtableEntry[newCapacity]; >>>> >>>> When the capacity of the backing array in a ByteArrayOutputStream >>>> reaches >>>> (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write >>>> operation is invoked, the capacity of the backing array is increased >>>> only by >>>> the required number of elements. Notice that in the following excerpt >>>> from >>>> ByteArrayOutputStream.write(int) the new backing array capacity is set >>>> to 2 >>>> * buf.length unless this value would not suffice to accommodate the >>>> required >>>> capacity in which case it is set to the required capacity. If the >>>> current >>>> backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1, then >>>> buf.length<< 1 overflows and resolves to a negative number resulting in >>>> the >>>> new capacity being set to the required capacity. The major consequence >>>> of >>>> this, like with ArrayList, is that each subsequent write operation >>>> results >>>> in a full resize of the ByteArrayOutputStream causing performance to >>>> degrade >>>> significantly. >>>> >>>> int newcount = count + 1; >>>> if (newcount> buf.length) { >>>> buf = Arrays.copyOf(buf, Math.max(buf.length<< 1, >>>> newcount)); >>>> } >>>> >>>> It is interesting to note that any statements about the amortized time >>>> complexity of add/insert operations, such as the one in the ArrayList >>>> javadoc, are invalidated by the performance related bugs. One solution >>>> to >>>> the above situations is to set the new capacity of the backing array to >>>> Integer.MAX_VALUE when the initial size calculation results in a >>>> negative >>>> number during a resize. >>>> >>>> Apologies if these bugs are already known. >>>> >>>> Regards, >>>> >>>> Kevin >>>> >>>> >> >> > > From kevin.l.stern at gmail.com Tue Mar 9 12:04:40 2010 From: kevin.l.stern at gmail.com (Kevin L. Stern) Date: Tue, 9 Mar 2010 06:04:40 -0600 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1704b7a21003090302p5bd875f3wbc7bf131296cb8dd@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> <1704b7a21003090238j74fcf04fs54e4d1aedbfa8a20@mail.gmail.com> <1704b7a21003090302p5bd875f3wbc7bf131296cb8dd@mail.gmail.com> Message-ID: <1704b7a21003090404o1a3a3e56x4aa1b89c88466c28@mail.gmail.com> Please excuse me - Martin is saving an 'if' statement in the vast majority of scenarios since, presumably, the overflow scenario occurs very infrequently (given that the bug has been in place for quite awhile). On Tue, Mar 9, 2010 at 5:02 AM, Kevin L. Stern wrote: > I did a quick search and it appears that Java is indeed two's complement > based. Nonetheless, please allow me to point out that, in general, this > type of code worries me since I fully expect that at some point someone will > come along and do exactly what Dmytro suggested; that is, someone will > change: > > if (a - b > 0) > > to > > if (a > b) > > and the entire ship will sink. I, personally, like to avoid obscurities > such as making integer overflow an essential basis for my algorithm unless > there is a good reason to do so. I would, in general, prefer to avoid > overflow altogether and to make the overflow scenario more explicit: > > if (oldCapacity > RESIZE_OVERFLOW_THRESHOLD) { > // do something > } else { > // do something else > } > > Of course, these are simply my coding preferences and I may very well be > missing the 'good reason' to take the current approach. > > Regards, > > Kevin > > > On Tue, Mar 9, 2010 at 4:38 AM, Kevin L. Stern wrote: > >> These comparisons are essential to the working of Martin's algorithm. I >> found them interesting as well, but notice that when the capacity overflows >> these comparisons will always be false. That is to say: >> >> oldCapacity < minCapacity (given, otherwise we would not be resizing) >> therefore oldCapacity + (0.5 for ArrayList, else 1) * oldCapacity - >> minCapacity < oldCapacity >> >> So if oldCapacity + (0.5 for ArrayList, else 1) * oldCapacity > >> Integer.MAX_VALUE, subtracting minCapacity re-overflows back into the >> positive number realm. >> >> That being said, and this is a question/comment to all, I want to point >> out that this type of code assumes a particular class of orderly overflow >> behavior. Is this specified in the Java spec, or will this break on an >> obscure machine that does not use, say, two's complement arithmetic? >> >> Regards, >> >> Kevin >> >> 2010/3/9 Dmytro Sheyko >> >> Is there any reason to use comparison like this >>> >>> if (newCapacity - minCapacity < 0) >>> >>> if (newCapacity - MAX_ARRAY_SIZE > 0) { >>> >>> instead of >>> >>> if (newCapacity < minCapacity) >>> >>> if (newCapacity > MAX_ARRAY_SIZE) { >>> >>> Thanks, >>> Dmytro >>> >>> > Date: Mon, 8 Mar 2010 18:10:37 -0800 >>> > Subject: Re: Bugs in java.util.ArrayList, java.util.Hashtable and >>> java.io.ByteArrayOutputStream >>> > From: martinrb at google.com >>> > To: kevin.l.stern at gmail.com; christopher.hegarty at sun.com; >>> alan.bateman at sun.com >>> > CC: core-libs-dev at openjdk.java.net >>> >>> > >>> > [Chris or Alan, please review and file a bug] >>> > >>> > OK, guys, >>> > >>> > Here's a patch: >>> > >>> > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/ >>> > >>> > Martin >>> >>> >>> ------------------------------ >>> Hotmail: Trusted email with powerful SPAM protection. Sign up now. >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ulf.Zibis at gmx.de Tue Mar 9 15:44:22 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 09 Mar 2010 16:44:22 +0100 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1704b7a21003090302p5bd875f3wbc7bf131296cb8dd@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> <1704b7a21003090238j74fcf04fs54e4d1aedbfa8a20@mail.gmail.com> <1704b7a21003090302p5bd875f3wbc7bf131296cb8dd@mail.gmail.com> Message-ID: <4B966CD6.1020502@gmx.de> Am 09.03.2010 12:02, schrieb Kevin L. Stern: > I did a quick search and it appears that Java is indeed two's > complement based. Nonetheless, please allow me to point out that, in > general, this type of code worries me since I fully expect that at > some point someone will come along and do exactly what Dmytro > suggested; that is, someone will change: > > if (a - b > 0) > > to > > if (a > b) > > and the entire ship will sink. I, personally, like to avoid > obscurities such as making integer overflow an essential basis for my > algorithm unless there is a good reason to do so. I would, in > general, prefer to avoid overflow altogether and to make the overflow > scenario more explicit: +1 I think those optimizations should be done by HotSpot. -Ulf From martinrb at google.com Tue Mar 9 19:18:32 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 9 Mar 2010 11:18:32 -0800 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <4B96381E.6030902@gmx.de> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> <4B96381E.6030902@gmx.de> Message-ID: <1ccfd1c11003091118m3a0106cap88c7af89d01b6bf8@mail.gmail.com> On Tue, Mar 9, 2010 at 03:59, Ulf Zibis wrote: > In PriorityQueue: > > let's result newCapacity in 0xFFFF.FFFC ?=-4 > then "if (newCapacity - MAX_ARRAY_SIZE > 0)" ---> false > then Arrays.copyOf(queue, newCapacity) ---> ArrayIndexOutOfBoundsException How could newCapacity ever become -4? Since growth is by 50%. But even 100% looks safe... > Am I wrong ? > > 2.) Why don't you prefer a system-wide constant for MAX_ARRAY_SIZE ??? This should never become a public API - it's a bug in the VM. I prefer the duplication of code to creating a new external dependency. Martin > -Ulf From opinali at gmail.com Tue Mar 9 20:02:23 2010 From: opinali at gmail.com (Osvaldo Doederlein) Date: Tue, 9 Mar 2010 17:02:23 -0300 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1ccfd1c11003091118m3a0106cap88c7af89d01b6bf8@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> <4B96381E.6030902@gmx.de> <1ccfd1c11003091118m3a0106cap88c7af89d01b6bf8@mail.gmail.com> Message-ID: Should we really consider this a VM bug? I'm not sure that it's a good idea to allocate a single object which size exceeds 4Gb (for a byte[] - due to the object header and array size field) - even on a 64-bit VM. An array with 2^32 elements is impossible, the maximum allowed by the size field is 2^32-1 which will be just as bad as 2^32-N for any other tiny positive N, for algorithms that love arrays of [base-2-] "round" sizes. And then if this bug is fixed, it may have slightly different variations. For a long[] or double[] array, the allocation for the maximum size would exceed 32Gb, so it exceeds the maximum heap size for 64-bit HotSpot with CompressedOops. (Ok this is an artificial issue because we won't like have a 100% free heap, so the only impediment for "new long[2^32-1]" would be the array header.) My suggestion: impose some fixed N (maybe 64, or 0x100, ...), limiting arrays to 2^32-N (for ANY element type). The artificial restriction should be large enough to fit the object header of any vendor's JVM, plus the per-object overhead of any reasonable heap structure. This limit could be added to the spec, so the implementation is not a bug anymore :) and it would be a portable limit. Otherwise, some app may work reliably on HotSpot if it relies on the fact that 2^32-5 positions are possible, but may break on some other vendor's JVM where perhaps the implementation limit is 2^32-13 or something else. A+ Osvaldo 2010/3/9 Martin Buchholz > On Tue, Mar 9, 2010 at 03:59, Ulf Zibis wrote: > > In PriorityQueue: > > > > let's result newCapacity in 0xFFFF.FFFC =-4 > > then "if (newCapacity - MAX_ARRAY_SIZE > 0)" ---> false > > then Arrays.copyOf(queue, newCapacity) ---> > ArrayIndexOutOfBoundsException > > How could newCapacity ever become -4? > Since growth is by 50%. But even 100% looks safe... > > > Am I wrong ? > > > > 2.) Why don't you prefer a system-wide constant for MAX_ARRAY_SIZE ??? > > This should never become a public API - it's a bug in the VM. > > I prefer the duplication of code to creating a new external dependency. > > Martin > > > -Ulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinrb at google.com Tue Mar 9 20:04:06 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 9 Mar 2010 12:04:06 -0800 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1704b7a21003090302p5bd875f3wbc7bf131296cb8dd@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> <1704b7a21003090238j74fcf04fs54e4d1aedbfa8a20@mail.gmail.com> <1704b7a21003090302p5bd875f3wbc7bf131296cb8dd@mail.gmail.com> Message-ID: <1ccfd1c11003091204q4d9e43a5g43fd8454059d1c88@mail.gmail.com> On Tue, Mar 9, 2010 at 03:02, Kevin L. Stern wrote: > I did a quick search and it appears that Java is indeed two's complement > based.? Nonetheless, please allow me to point out that, in general, this > type of code worries me since I fully expect that at some point someone will > come along and do exactly what Dmytro suggested; that is, someone will > change: > > if (a - b > 0) > > to > > if (a > b) > > and the entire ship will sink.? I, personally, like to avoid obscurities > such as making integer overflow an essential basis for my algorithm unless > there is a good reason to do so.? I would, in general, prefer to avoid > overflow altogether and to make the overflow scenario more explicit: > > if (oldCapacity > RESIZE_OVERFLOW_THRESHOLD) { > ?? // do something > } else { > ? // do something else > } It's a good point. In ArrayList we cannot do this (or at least not compatibly) because ensureCapacity is a public API and effectively already accepts negative numbers as requests for a positive capacity that cannot be satisfied. The current API is used like this: int newcount = count + len; ensureCapacity(newcount); If you want to avoid overflow, you would need to change to something less natural like ensureCapacity(count, len); int newcount = count + len; Anyways, I'm keeping the overflow-conscious code, but adding more warning comments, and "out-lining" huge array creation so that ArrayList's code now looks like: /** * Increases the capacity of this ArrayList instance, if * necessary, to ensure that it can hold at least the number of elements * specified by the minimum capacity argument. * * @param minCapacity the desired minimum capacity */ public void ensureCapacity(int minCapacity) { modCount++; // overflow-conscious code if (minCapacity - elementData.length > 0) grow(minCapacity); } /** * The maximum size of array to allocate. * Some VMs reserve some header words in an array. * Attempts to allocate larger arrays may result in * OutOfMemoryError: Requested array size exceeds VM limit */ private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8; /** * Increases the capacity to ensure that it can hold at least the * number of elements specified by the minimum capacity argument. * * @param minCapacity the desired minimum capacity */ private void grow(int minCapacity) { // overflow-conscious code int oldCapacity = elementData.length; int newCapacity = oldCapacity + (oldCapacity >> 1); if (newCapacity - minCapacity < 0) newCapacity = minCapacity; if (newCapacity - MAX_ARRAY_SIZE > 0) newCapacity = hugeCapacity(minCapacity); // minCapacity is usually close to size, so this is a win: elementData = Arrays.copyOf(elementData, newCapacity); } private int hugeCapacity(int minCapacity) { if (minCapacity < 0) // overflow throw new OutOfMemoryError(); return (minCapacity > MAX_ARRAY_SIZE) ? Integer.MAX_VALUE : MAX_ARRAY_SIZE; } Webrev regenerated. Martin > Of course, these are simply my coding preferences and I may very well be > missing the 'good reason' to take the current approach. > > Regards, > > Kevin > > On Tue, Mar 9, 2010 at 4:38 AM, Kevin L. Stern > wrote: >> >> These comparisons are essential to the working of Martin's algorithm.? I >> found them interesting as well, but notice that when the capacity overflows >> these comparisons will always be false.? That is to say: >> >> oldCapacity < minCapacity (given, otherwise we would not be resizing) >> therefore oldCapacity + (0.5 for ArrayList, else 1) * oldCapacity - >> minCapacity < oldCapacity >> >> So if oldCapacity + (0.5 for ArrayList, else 1) * oldCapacity > >> Integer.MAX_VALUE, subtracting minCapacity re-overflows back into the >> positive number realm. >> >> That being said, and this is a question/comment to all, I want to point >> out that this type of code assumes a particular class of orderly overflow >> behavior.? Is this specified in the Java spec, or will this break on an >> obscure machine that does not use, say, two's complement arithmetic? >> >> Regards, >> >> Kevin >> >> 2010/3/9 Dmytro Sheyko >>> >>> Is there any reason to use comparison like this >>> >>> if (newCapacity - minCapacity < 0) >>> >>> if (newCapacity - MAX_ARRAY_SIZE > 0) { >>> >>> instead of >>> >>> if (newCapacity < minCapacity) >>> >>> if (newCapacity > MAX_ARRAY_SIZE) { >>> >>> Thanks, >>> Dmytro >>> >>> > Date: Mon, 8 Mar 2010 18:10:37 -0800 >>> > Subject: Re: Bugs in java.util.ArrayList, java.util.Hashtable and >>> > java.io.ByteArrayOutputStream >>> > From: martinrb at google.com >>> > To: kevin.l.stern at gmail.com; christopher.hegarty at sun.com; >>> > alan.bateman at sun.com >>> > CC: core-libs-dev at openjdk.java.net >>> > >>> > [Chris or Alan, please review and file a bug] >>> > >>> > OK, guys, >>> > >>> > Here's a patch: >>> > >>> > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/ >>> > >>> > Martin >>> >>> >>> ________________________________ >>> Hotmail: Trusted email with powerful SPAM protection. Sign up now. > > From martinrb at google.com Tue Mar 9 20:15:41 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 9 Mar 2010 12:15:41 -0800 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> <4B96381E.6030902@gmx.de> <1ccfd1c11003091118m3a0106cap88c7af89d01b6bf8@mail.gmail.com> Message-ID: <1ccfd1c11003091215x7cc353calf8b03579a86a58f9@mail.gmail.com> It surely is not a good idea to use a single backing array for huge arrays. As you point out, it's up to 32GB for just one object. But the core JDK doesn't offer a suitable alternative for users who need very large collections. It would have been more in the spirit of Java to have a collection class instead of ArrayList that was not fastest at any particular operation, but had excellent asymptotic behaviour, based on backing arrays containing backing arrays. But: - no such excellent class has been written yet (or please point me to such a class) - even if it were, such a best-of-breed-general-purpose List implementation would probably need to be introduced as a separate class, because of the performance expectations of existing implementations. In the meantime, we have to maintain what we got, and that includes living with arrays and classes that wrap them. Changing the spec is unlikely to succeed.. Martin On Tue, Mar 9, 2010 at 12:02, Osvaldo Doederlein wrote: > Should we really consider this a VM bug? I'm not sure that it's a good idea > to allocate a single object which size exceeds 4Gb (for a byte[] - due to > the object header and array size field) - even on a 64-bit VM. An array with > 2^32 elements is impossible, the maximum allowed by the size field is 2^32-1 > which will be just as bad as 2^32-N for any other tiny positive N, for > algorithms that love arrays of [base-2-] "round" sizes. > > And then if this bug is fixed, it may have slightly different variations. > For a long[] or double[] array, the allocation for the maximum size would > exceed 32Gb, so it exceeds the maximum heap size for 64-bit HotSpot with > CompressedOops. (Ok this is an artificial issue because we won't like have a > 100% free heap, so the only impediment for "new long[2^32-1]" would be the > array header.) > > My suggestion: impose some fixed N (maybe 64, or 0x100, ...), limiting > arrays to 2^32-N (for ANY element type). The artificial restriction should > be large enough to fit the object header of any vendor's JVM, plus the > per-object overhead of any reasonable heap structure. This limit could be > added to the spec, so the implementation is not a bug anymore :) and it > would be a portable limit. Otherwise, some app may work reliably on HotSpot > if it relies on the fact that 2^32-5 positions are possible, but may break > on some other vendor's JVM where perhaps the implementation limit is 2^32-13 > or something else. > > A+ > Osvaldo > > 2010/3/9 Martin Buchholz >> >> On Tue, Mar 9, 2010 at 03:59, Ulf Zibis wrote: >> > In PriorityQueue: >> > >> > let's result newCapacity in 0xFFFF.FFFC ?=-4 >> > then "if (newCapacity - MAX_ARRAY_SIZE > 0)" ---> false >> > then Arrays.copyOf(queue, newCapacity) ---> >> > ArrayIndexOutOfBoundsException >> >> How could newCapacity ever become -4? >> Since growth is by 50%. ?But even 100% looks safe... >> >> > Am I wrong ? >> > >> > 2.) Why don't you prefer a system-wide constant for MAX_ARRAY_SIZE ??? >> >> This should never become a public API - it's a bug in the VM. >> >> I prefer the duplication of code to creating a new external dependency. >> >> Martin >> >> > -Ulf > > From Ulf.Zibis at gmx.de Tue Mar 9 20:25:28 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 09 Mar 2010 21:25:28 +0100 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1ccfd1c11003091118m3a0106cap88c7af89d01b6bf8@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> <4B96381E.6030902@gmx.de> <1ccfd1c11003091118m3a0106cap88c7af89d01b6bf8@mail.gmail.com> Message-ID: <4B96AEB8.4070406@gmx.de> Am 09.03.2010 20:18, schrieb Martin Buchholz: > On Tue, Mar 9, 2010 at 03:59, Ulf Zibis wrote: > >> In PriorityQueue: >> >> let's result newCapacity in 0xFFFF.FFFC =-4 >> then "if (newCapacity - MAX_ARRAY_SIZE> 0)" ---> false >> then Arrays.copyOf(queue, newCapacity) ---> ArrayIndexOutOfBoundsException >> > How could newCapacity ever become -4? > Since growth is by 50%. Oops, I must admit, that I didn't evaluate that. Many tricks are interwoven here at one place. I think, those magic should be better commented. Isn't PriorityQueue a public API, visible to everybody? As said, I think Hotspot compiler would be the better place to optimize those if...else branches. > But even 100% looks safe... > Hm, having oldCapacity = 0x7FFF.FFFE + 100 % makes 0xFFFF.FFFC > >> Am I wrong ? >> >> 2.) Why don't you prefer a system-wide constant for MAX_ARRAY_SIZE ??? >> > This should never become a public API - it's a bug in the VM. > > I prefer the duplication of code to creating a new external dependency. > Good use case for new super package facility. I can sympathise your reserve. On the other hand ... - if there is a limit, developers should have a chance, to evaluate against it to avoid OutOfMemoryError. - maybe other VM's have a other/much lower limit, e.g. on small mobile systems. - If the bug would be fixed, who takes care about the "garbage collection" in the code base? - There is many public stuff in sun.misc.VM class, why not MAX_ARRAY_SIZE/maxArraySize()? -Ulf From Ulf.Zibis at gmx.de Tue Mar 9 21:08:15 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 09 Mar 2010 22:08:15 +0100 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> <4B96381E.6030902@gmx.de> <1ccfd1c11003091118m3a0106cap88c7af89d01b6bf8@mail.gmail.com> Message-ID: <4B96B8BF.2020001@gmx.de> Am 09.03.2010 21:02, schrieb Osvaldo Doederlein: > Should we really consider this a VM bug? I'm not sure that it's a good > idea to allocate a single object which size exceeds 4Gb (for a byte[] > - due to the object header and array size field) - even on a 64-bit > VM. An array with 2^32 elements is impossible, the maximum allowed by > the size field is 2^32-1 which will be just as bad as 2^32-N for any > other tiny positive N, for algorithms that love arrays of [base-2-] > "round" sizes. > > And then if this bug is fixed, it may have slightly different > variations. For a long[] or double[] array, the allocation for the > maximum size would exceed 32Gb, so it exceeds the maximum heap size > for 64-bit HotSpot with CompressedOops. (Ok this is an artificial > issue because we won't like have a 100% free heap, so the only > impediment for "new long[2^32-1]" would be the array header.) > > My suggestion: impose some fixed N (maybe 64, or 0x100, ...), limiting > arrays to 2^32-N (for ANY element type). The artificial restriction > should be large enough to fit the object header of any vendor's JVM, > plus the per-object overhead of any reasonable heap structure. This > limit could be added to the spec, so the implementation is not a bug > anymore :) and it would be a portable limit. Otherwise, some app may > work reliably on HotSpot if it relies on the fact that 2^32-5 > positions are possible, but may break on some other vendor's JVM where > perhaps the implementation limit is 2^32-13 or something else. > Please allow to correct: it's 231-N ! ...but +1 for your arguments. In [base-2-] "round" sense, why there is the "+1" in [1] ? I think [2] would look best. I'm sure, HotSpot anyway would optimize to (oldCapacity + (oldCapacity >> 1)) Look at the HotSpot disassembly for String#hashCode(), h*31 becomes h<<5-h. -Ulf [1] current PriorityQueue snippet: ... int newCapacity = ((oldCapacity < 64)? ((oldCapacity + 1) * 2): ((oldCapacity / 2) * 3)); ... [2] new PriorityQueue snippet: ... int newCapacity += (oldCapacity < 64) ? oldCapacity : oldCapacity / 2; ... -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinrb at google.com Tue Mar 9 22:08:46 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 9 Mar 2010 14:08:46 -0800 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <4B96B8BF.2020001@gmx.de> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> <4B96381E.6030902@gmx.de> <1ccfd1c11003091118m3a0106cap88c7af89d01b6bf8@mail.gmail.com> <4B96B8BF.2020001@gmx.de> Message-ID: <1ccfd1c11003091408g4eb63e1ckd92ef6156846086b@mail.gmail.com> On Tue, Mar 9, 2010 at 13:08, Ulf Zibis wrote: > > [1] current PriorityQueue snippet: > ... > ??????? int newCapacity = ((oldCapacity < 64)? > ?????????????????????????? ((oldCapacity + 1) * 2): > ?????????????????????????? ((oldCapacity / 2) * 3)); > ... > [2] new PriorityQueue snippet: > ... > ??????? int newCapacity += (oldCapacity < 64) ? > ?????????????????????????? oldCapacity : oldCapacity / 2; > ... Thanks, I took your suggestion and changed it to: int newCapacity = oldCapacity + ((oldCapacity < 64) ? (oldCapacity + 2) : (oldCapacity >> 1)); Martin From Ulf.Zibis at gmx.de Tue Mar 9 23:11:06 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 10 Mar 2010 00:11:06 +0100 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1ccfd1c11003091408g4eb63e1ckd92ef6156846086b@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> <4B96381E.6030902@gmx.de> <1ccfd1c11003091118m3a0106cap88c7af89d01b6bf8@mail.gmail.com> <4B96B8BF.2020001@gmx.de> <1ccfd1c11003091408g4eb63e1ckd92ef6156846086b@mail.gmail.com> Message-ID: <4B96D58A.1050201@gmx.de> Am 09.03.2010 23:08, schrieb Martin Buchholz: > On Tue, Mar 9, 2010 at 13:08, Ulf Zibis wrote: > >> [1] current PriorityQueue snippet: >> ... >> int newCapacity = ((oldCapacity< 64)? >> ((oldCapacity + 1) * 2): >> ((oldCapacity / 2) * 3)); >> ... >> [2] new PriorityQueue snippet: >> ... >> int newCapacity += (oldCapacity< 64) ? >> oldCapacity : oldCapacity / 2; >> ... >> > Thanks, I took your suggestion and changed it to: > > int newCapacity = oldCapacity + ((oldCapacity< 64) ? > (oldCapacity + 2) : > (oldCapacity>> 1)); > Oops :-[ Can you explain the mystery about "+ 2" ? -Ulf From martinrb at google.com Tue Mar 9 23:22:49 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 9 Mar 2010 15:22:49 -0800 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <4B96D58A.1050201@gmx.de> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> <4B96381E.6030902@gmx.de> <1ccfd1c11003091118m3a0106cap88c7af89d01b6bf8@mail.gmail.com> <4B96B8BF.2020001@gmx.de> <1ccfd1c11003091408g4eb63e1ckd92ef6156846086b@mail.gmail.com> <4B96D58A.1050201@gmx.de> Message-ID: <1ccfd1c11003091522l66b7d92ndea2e2105536bb8c@mail.gmail.com> On Tue, Mar 9, 2010 at 15:11, Ulf Zibis wrote: > Can you explain the mystery about "+ 2" ? It's exactly the same as the old resizing behavior. Martin From martinrb at google.com Wed Mar 10 01:05:06 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 9 Mar 2010 17:05:06 -0800 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4B92C263.9020404@gmx.de> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> Message-ID: <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> Here's the proposed fix for 6931812: A better implementation of sun.nio.cs.Surrogate.isBMP(int) http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint/ I changed the name to isBMPCodePoint in preparation for moving it to Character.java. (Sherman, perhaps you would like to take on that followon task?) Sherman, please approve. Martin On Sat, Mar 6, 2010 at 13:00, Ulf Zibis wrote: > Very fast Sherman, much thanks. > > Could you set the bug to accepted and evaluated, so my patch will have a > chance to get into the code base? > > -Ulf > > > Am 03.03.2010 20:11, schrieb Xueming Shen: >> >> #6931812 >> >> Martin Buchholz wrote: >>> >>> Sherman, would you like to file bugs for Ulf's improvements? >>> >>> On Wed, Mar 3, 2010 at 02:44, Ulf Zibis wrote: >>>> >>>> Am 03.03.2010 09:00, schrieb Martin Buchholz: >>> >>>>> Keep in mind that supplementary characters are extremely rare. >>>>> >>>> Yes, but many API's in the JDK are used rarely. >>>> Why should they waste memory footprint / perform bad, particularly if it >>>> doesn't cost anything. >>> >>> I admire your perfectionism. >>> >>>>> Therefore the existing implementation >>>>> >>>>> ?return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>>> && ?codePoint<= MAX_CODE_POINT; >>>>> >>>>> will almost always perform just one comparison against a constant, >>>>> which is hard to beat. >>>>> >>>> 1. Wondering: I think there are TWO comparisons. >>>> 2. Those comparisons need to load 32 bit values from machine code, >>>> against >>>> only 8 bit values in my case. >>> >>> It's a good point. ?In the machine code, shifts are likely to use >>> immediate values, and so will be a small win. >>> >>> int x = codePoint >>> 16; >>> return x != 0 && x < 0x11; >>> >>> (On modern hardware, these optimizations >>> are less valuable than they used to be; >>> ordinary integer arithmetic is almost free) >>> >>> Martin >> >> > > From gokdogan at gmail.com Wed Mar 10 09:58:38 2010 From: gokdogan at gmail.com (Goktug Gokdogan) Date: Wed, 10 Mar 2010 03:58:38 -0600 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> Message-ID: Similarly, BitSet.ensureCapacity AbstractStringBuilder.expandCapacity Vector.ensureCapacityHelper methods need to have similar checks and/or throw proper exceptions. By the way, I did not understand why IdentityHashMap and HashMap have different MAXIMUM_CAPACITY and different logic to handle resize and overflow. On Mon, Mar 8, 2010 at 8:10 PM, Martin Buchholz wrote: > [Chris or Alan, please review and file a bug] > > OK, guys, > > Here's a patch: > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/ > > Martin > > On Fri, Mar 5, 2010 at 02:48, Kevin L. Stern > wrote: > > Hi Martin, > > > > Thank you for your reply. If I may, PriorityQueue appears to employ the > > simple strategy that I suggested above in its grow method: > > > > int newCapacity = ((oldCapacity < 64)? > > ((oldCapacity + 1) * 2): > > ((oldCapacity / 2) * 3)); > > if (newCapacity < 0) // overflow > > newCapacity = Integer.MAX_VALUE; > > > > It might be desirable to set a common strategy for capacity increase for > all > > collections. > > > > Regards, > > > > Kevin > > > > On Fri, Mar 5, 2010 at 3:04 AM, Martin Buchholz > wrote: > >> > >> Hi Kevin, > >> > >> As you've noticed, creating objects within a factor of two of > >> their natural limits is a good way to expose lurking bugs. > >> > >> I'm the one responsible for the algorithm in ArrayList. > >> I'm a bit embarrassed, looking at that code today. > >> We could set the array size to Integer.MAX_VALUE, > >> but then you might hit an independent buglet in hotspot > >> that you cannot allocate an array with Integer.MAX_VALUE > >> elements, but Integer.MAX_VALUE - 5 (or so) works. > >> > >> It occurs to me that increasing the size by 50% is better done by > >> int newCapacity = oldCapacity + (oldCapacity >> 1) + 1; > >> > >> I agree with the plan of setting the capacity to something near > >> MAX_VALUE on overflow, and throw OutOfMemoryError on next resize. > >> > >> These bugs are not known. > >> Chris Hegarty, could you file a bug for us? > >> > >> Martin > >> > >> On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern > >> wrote: > >> > Greetings, > >> > > >> > I've noticed bugs in java.util.ArrayList, java.util.Hashtable and > >> > java.io.ByteArrayOutputStream which arise when the capacities of the > >> > data > >> > structures reach a particular threshold. More below. > >> > > >> > When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE its > >> > size > >> > reaches its capacity and an add or an insert operation is invoked, the > >> > capacity is increased by only one element. Notice that in the > following > >> > excerpt from ArrayList.ensureCapacity the new capacity is set to (3/2) > * > >> > oldCapacity + 1 unless this value would not suffice to accommodate the > >> > required capacity in which case it is set to the required capacity. > If > >> > the > >> > current capacity is at least (2/3)*Integer.MAX_VALUE, then > (oldCapacity > >> > * > >> > 3)/2 + 1 overflows and resolves to a negative number resulting in the > >> > new > >> > capacity being set to the required capacity. The major consequence of > >> > this > >> > is that each subsequent add/insert operation results in a full resize > of > >> > the > >> > ArrayList causing performance to degrade significantly. > >> > > >> > int newCapacity = (oldCapacity * 3)/2 + 1; > >> > if (newCapacity < minCapacity) > >> > newCapacity = minCapacity; > >> > > >> > Hashtable breaks entirely when the size of its backing array reaches > >> > (1/2) * > >> > Integer.MAX_VALUE and a rehash is necessary as is evident from the > >> > following > >> > excerpt from rehash. Notice that rehash will attempt to create an > array > >> > of > >> > negative size if the size of the backing array reaches (1/2) * > >> > Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves to > a > >> > negative number. > >> > > >> > int newCapacity = oldCapacity * 2 + 1; > >> > HashtableEntry newTable[] = new HashtableEntry[newCapacity]; > >> > > >> > When the capacity of the backing array in a ByteArrayOutputStream > >> > reaches > >> > (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write > >> > operation is invoked, the capacity of the backing array is increased > >> > only by > >> > the required number of elements. Notice that in the following excerpt > >> > from > >> > ByteArrayOutputStream.write(int) the new backing array capacity is set > >> > to 2 > >> > * buf.length unless this value would not suffice to accommodate the > >> > required > >> > capacity in which case it is set to the required capacity. If the > >> > current > >> > backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1, then > >> > buf.length << 1 overflows and resolves to a negative number resulting > in > >> > the > >> > new capacity being set to the required capacity. The major > consequence > >> > of > >> > this, like with ArrayList, is that each subsequent write operation > >> > results > >> > in a full resize of the ByteArrayOutputStream causing performance to > >> > degrade > >> > significantly. > >> > > >> > int newcount = count + 1; > >> > if (newcount > buf.length) { > >> > buf = Arrays.copyOf(buf, Math.max(buf.length << 1, > >> > newcount)); > >> > } > >> > > >> > It is interesting to note that any statements about the amortized time > >> > complexity of add/insert operations, such as the one in the ArrayList > >> > javadoc, are invalidated by the performance related bugs. One > solution > >> > to > >> > the above situations is to set the new capacity of the backing array > to > >> > Integer.MAX_VALUE when the initial size calculation results in a > >> > negative > >> > number during a resize. > >> > > >> > Apologies if these bugs are already known. > >> > > >> > Regards, > >> > > >> > Kevin > >> > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christopher.hegarty at sun.com Wed Mar 10 14:47:32 2010 From: christopher.hegarty at sun.com (christopher.hegarty at sun.com) Date: Wed, 10 Mar 2010 14:47:32 +0000 Subject: hg: jdk7/tl/jdk: 6933618: java/net/MulticastSocket/NoLoopbackPackets.java fails when rerun Message-ID: <20100310144817.62C2D445BC@hg.openjdk.java.net> Changeset: 47958f76babc Author: chegar Date: 2010-03-10 14:44 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/47958f76babc 6933618: java/net/MulticastSocket/NoLoopbackPackets.java fails when rerun Reviewed-by: alanb ! test/java/net/MulticastSocket/NoLoopbackPackets.java From Ulf.Zibis at gmx.de Wed Mar 10 17:36:48 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 10 Mar 2010 18:36:48 +0100 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <1ccfd1c11003091522l66b7d92ndea2e2105536bb8c@mail.gmail.com> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> <4B96381E.6030902@gmx.de> <1ccfd1c11003091118m3a0106cap88c7af89d01b6bf8@mail.gmail.com> <4B96B8BF.2020001@gmx.de> <1ccfd1c11003091408g4eb63e1ckd92ef6156846086b@mail.gmail.com> <4B96D58A.1050201@gmx.de> <1ccfd1c11003091522l66b7d92ndea2e2105536bb8c@mail.gmail.com> Message-ID: <4B97D8B0.7020604@gmx.de> Am 10.03.2010 00:22, schrieb Martin Buchholz: > On Tue, Mar 9, 2010 at 15:11, Ulf Zibis wrote: > > >> Can you explain the mystery about "+ 2" ? >> > It's exactly the same as the old resizing behavior. In detail I meant, if you have any idea, why the original designers could have chosen the "+1". The code would be smarter, if ommited, + would serve the algorithm-loved arrays of [base-2-] "round" sizes. -Ulf From Ulf.Zibis at gmx.de Wed Mar 10 17:58:11 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 10 Mar 2010 18:58:11 +0100 Subject: Progress of patches In-Reply-To: <4B96F476.1060409@gmx.de> References: <4B96E323.80505@gmx.de> <1ccfd1c11003091710u1e2ec8cevf64110ee3af2d88b@mail.gmail.com> <4B96F476.1060409@gmx.de> Message-ID: <4B97DDB3.2080107@gmx.de> Hi Martin, there wasn't enough time today, so please wait for tomorrow. In brief: - I wouldn't rename to isBMPCodePoint(), because there are many other names in Surrogate class that don't sync to Character and and a usages search in sun.nio.cs.* or where ever else could be omitted. Better add "// return Character.isBMPCodePoint(uc);" as hint for the future. - Thanks for mention me as contributor. - Doesn't the bug description include the addition of isBMPCodePoint() to class Character and the equivalent enhancement to isSupplementaryCodePoint() ? -Ulf Am 10.03.2010 02:23, schrieb Ulf Zibis: > Much thanks Martin, > > I'll do it tomorrow. Now it's time to sleep here in Germany. > > -Ulf > > > Am 10.03.2010 02:10, schrieb Martin Buchholz: >> Do you have a collection of patch files to be upstreamed? >> >> Easiest for me would be a publicly readable >> directory of patches like I maintain on >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ >> >> that I can >> hg qimport >> into my own mq outgoing jdk repo. >> >> (Thanks for your hard work) >> >> Martin >> From Xueming.Shen at Sun.COM Wed Mar 10 18:23:57 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Wed, 10 Mar 2010 10:23:57 -0800 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> Message-ID: <4B97E3BD.2000901@sun.com> approved. I don't have a spare ws right now.so please just push, it's almost there:-) sherman Martin Buchholz wrote: > Here's the proposed fix for > 6931812: A better implementation of sun.nio.cs.Surrogate.isBMP(int) > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint/ > > I changed the name to isBMPCodePoint in preparation for moving > it to Character.java. > (Sherman, perhaps you would like to take on that followon task?) > > Sherman, please approve. > > Martin > > On Sat, Mar 6, 2010 at 13:00, Ulf Zibis wrote: > >> Very fast Sherman, much thanks. >> >> Could you set the bug to accepted and evaluated, so my patch will have a >> chance to get into the code base? >> >> -Ulf >> >> >> Am 03.03.2010 20:11, schrieb Xueming Shen: >> >>> #6931812 >>> >>> Martin Buchholz wrote: >>> >>>> Sherman, would you like to file bugs for Ulf's improvements? >>>> >>>> On Wed, Mar 3, 2010 at 02:44, Ulf Zibis wrote: >>>> >>>>> Am 03.03.2010 09:00, schrieb Martin Buchholz: >>>>> >>>>>> Keep in mind that supplementary characters are extremely rare. >>>>>> >>>>>> >>>>> Yes, but many API's in the JDK are used rarely. >>>>> Why should they waste memory footprint / perform bad, particularly if it >>>>> doesn't cost anything. >>>>> >>>> I admire your perfectionism. >>>> >>>> >>>>>> Therefore the existing implementation >>>>>> >>>>>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>>>> && codePoint<= MAX_CODE_POINT; >>>>>> >>>>>> will almost always perform just one comparison against a constant, >>>>>> which is hard to beat. >>>>>> >>>>>> >>>>> 1. Wondering: I think there are TWO comparisons. >>>>> 2. Those comparisons need to load 32 bit values from machine code, >>>>> against >>>>> only 8 bit values in my case. >>>>> >>>> It's a good point. In the machine code, shifts are likely to use >>>> immediate values, and so will be a small win. >>>> >>>> int x = codePoint >>> 16; >>>> return x != 0 && x < 0x11; >>>> >>>> (On modern hardware, these optimizations >>>> are less valuable than they used to be; >>>> ordinary integer arithmetic is almost free) >>>> >>>> Martin >>>> >>> >> From martinrb at google.com Wed Mar 10 23:14:43 2010 From: martinrb at google.com (martinrb at google.com) Date: Wed, 10 Mar 2010 23:14:43 +0000 Subject: hg: jdk7/tl/jdk: 6931812: A better implementation of sun.nio.cs.Surrogate.isBMP(int) Message-ID: <20100310231507.8C5B944633@hg.openjdk.java.net> Changeset: 467484e025d6 Author: martin Date: 2010-03-10 14:53 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/467484e025d6 6931812: A better implementation of sun.nio.cs.Surrogate.isBMP(int) Summary: uc >> 16 == 0 is superior to (int) (char) uc == uc Reviewed-by: sherman Contributed-by: Ulf Zibis ! src/share/classes/sun/nio/cs/Surrogate.java From jonathan.gibbons at sun.com Thu Mar 11 00:24:36 2010 From: jonathan.gibbons at sun.com (jonathan.gibbons at sun.com) Date: Thu, 11 Mar 2010 00:24:36 +0000 Subject: hg: jdk7/tl/langtools: 6933914: fix missing newlines Message-ID: <20100311002440.1FD1244644@hg.openjdk.java.net> Changeset: 9871ce4fd56f Author: jjg Date: 2010-03-10 16:23 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/9871ce4fd56f 6933914: fix missing newlines Reviewed-by: ohair ! test/tools/javac/OverrideChecks/6738538/T6738538a.java ! test/tools/javac/OverrideChecks/6738538/T6738538b.java ! test/tools/javac/api/6731573/Erroneous.java ! test/tools/javac/api/6731573/T6731573.java ! test/tools/javac/cast/6548436/T6548436d.java ! test/tools/javac/cast/6558559/T6558559a.java ! test/tools/javac/cast/6558559/T6558559b.java ! test/tools/javac/cast/6586091/T6586091.java ! test/tools/javac/enum/T6724345.java ! test/tools/javac/generics/T6557954.java ! test/tools/javac/generics/T6751514.java ! test/tools/javac/generics/T6869075.java ! test/tools/javac/generics/inference/6569789/T6569789.java ! test/tools/javac/generics/inference/6650759/T6650759a.java ! test/tools/javac/generics/wildcards/T6732484.java ! test/tools/javac/processing/model/util/elements/Foo.java ! test/tools/javac/varargs/T6746184.java - test/tools/javap/T6305779.java ! test/tools/javap/T6715251.java ! test/tools/javap/T6715753.java From martinrb at google.com Thu Mar 11 00:48:10 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 10 Mar 2010 16:48:10 -0800 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: <4B97D8B0.7020604@gmx.de> References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> <4B96381E.6030902@gmx.de> <1ccfd1c11003091118m3a0106cap88c7af89d01b6bf8@mail.gmail.com> <4B96B8BF.2020001@gmx.de> <1ccfd1c11003091408g4eb63e1ckd92ef6156846086b@mail.gmail.com> <4B96D58A.1050201@gmx.de> <1ccfd1c11003091522l66b7d92ndea2e2105536bb8c@mail.gmail.com> <4B97D8B0.7020604@gmx.de> Message-ID: <1ccfd1c11003101648v661c69d6n270dfde650cbdaaa@mail.gmail.com> On Wed, Mar 10, 2010 at 09:36, Ulf Zibis wrote: > Am 10.03.2010 00:22, schrieb Martin Buchholz: >> >> On Tue, Mar 9, 2010 at 15:11, Ulf Zibis ?wrote: >> >> >>> >>> Can you explain the mystery about "+ 2" ? >>> >> >> It's exactly the same as the old resizing behavior. > > In detail I meant, if you have any idea, why the original designers could > have chosen the "+1". > The code would be smarter, if ommited, + would serve the algorithm-loved > arrays of [base-2-] "round" sizes. I bet what happened is that the +2 is necessary for an initial capacity of 0. It turns out that the current implementation disallows this, so it it possible to simply double the size, but I am not going to change it now. On the other hand, you could consider it a feature that very small arrays should grow more rapidly than a factor of two. Martin From martinrb at google.com Thu Mar 11 01:03:32 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 10 Mar 2010 17:03:32 -0800 Subject: Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream In-Reply-To: References: <1704b7a21003031741m734545f1gb0170ed5fa6f6d68@mail.gmail.com> <1ccfd1c11003050104u61e776apc5fe2e5ec08e3dc0@mail.gmail.com> <1704b7a21003050248k1e893cedmd14f26cbecd45896@mail.gmail.com> <1ccfd1c11003081810u54fb22e6k25230f4eb5ca1b18@mail.gmail.com> Message-ID: <1ccfd1c11003101703g42708603j1cedeeae8c74b68c@mail.gmail.com> On Wed, Mar 10, 2010 at 01:58, Goktug Gokdogan wrote: > Similarly, > ??BitSet.ensureCapacity I don't think BitSet has this problem, because the bits are stored in longs, so the array can never overflow. But don't believe me - prove me wrong! > ??AbstractStringBuilder.expandCapacity Yup. > ??Vector.ensureCapacityHelper Yup. The scope of this fix is growing... > methods need to have similar checks and/or throw proper exceptions. > > By the way, I did not understand why?IdentityHashMap and HashMap have > different MAXIMUM_CAPACITY and different logic to handle resize and > overflow. These two classes store their data in completely different ways. In particular, HashMap need never fail because of overflow ; Integer.MAX_VALUE buckets should be enough for anybody! Webrev regenerated. http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/ Martin > > On Mon, Mar 8, 2010 at 8:10 PM, Martin Buchholz wrote: >> >> [Chris or Alan, please review and file a bug] >> >> OK, guys, >> >> Here's a patch: >> >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/ >> >> Martin >> >> On Fri, Mar 5, 2010 at 02:48, Kevin L. Stern >> wrote: >> > Hi Martin, >> > >> > Thank you for your reply.? If I may, PriorityQueue appears to employ the >> > simple strategy that I suggested above in its grow method: >> > >> > ??????? int newCapacity = ((oldCapacity < 64)? >> > ?????????????????????????? ((oldCapacity + 1) * 2): >> > ?????????????????????????? ((oldCapacity / 2) * 3)); >> > ??????? if (newCapacity < 0) // overflow >> > ??????????? newCapacity = Integer.MAX_VALUE; >> > >> > It might be desirable to set a common strategy for capacity increase for >> > all >> > collections. >> > >> > Regards, >> > >> > Kevin >> > >> > On Fri, Mar 5, 2010 at 3:04 AM, Martin Buchholz >> > wrote: >> >> >> >> Hi Kevin, >> >> >> >> As you've noticed, creating objects within a factor of two of >> >> their natural limits is a good way to expose lurking bugs. >> >> >> >> I'm the one responsible for the algorithm in ArrayList. >> >> I'm a bit embarrassed, looking at that code today. >> >> We could set the array size to Integer.MAX_VALUE, >> >> but then you might hit an independent buglet in hotspot >> >> that you cannot allocate an array with Integer.MAX_VALUE >> >> elements, but Integer.MAX_VALUE - 5 (or so) works. >> >> >> >> It occurs to me that increasing the size by 50% is better done by >> >> int newCapacity = oldCapacity + (oldCapacity >> 1) + 1; >> >> >> >> I agree with the plan of setting the capacity to something near >> >> MAX_VALUE on overflow, and throw OutOfMemoryError on next resize. >> >> >> >> These bugs are not known. >> >> Chris Hegarty, could you file a bug for us? >> >> >> >> Martin >> >> >> >> On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern >> >> wrote: >> >> > Greetings, >> >> > >> >> > I've noticed bugs in java.util.ArrayList, java.util.Hashtable and >> >> > java.io.ByteArrayOutputStream which arise when the capacities of the >> >> > data >> >> > structures reach a particular threshold.? More below. >> >> > >> >> > When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE its >> >> > size >> >> > reaches its capacity and an add or an insert operation is invoked, >> >> > the >> >> > capacity is increased by only one element.? Notice that in the >> >> > following >> >> > excerpt from ArrayList.ensureCapacity the new capacity is set to >> >> > (3/2) * >> >> > oldCapacity + 1 unless this value would not suffice to accommodate >> >> > the >> >> > required capacity in which case it is set to the required capacity. >> >> > If >> >> > the >> >> > current capacity is at least (2/3)*Integer.MAX_VALUE, then >> >> > (oldCapacity >> >> > * >> >> > 3)/2 + 1 overflows and resolves to a negative number resulting in the >> >> > new >> >> > capacity being set to the required capacity.? The major consequence >> >> > of >> >> > this >> >> > is that each subsequent add/insert operation results in a full resize >> >> > of >> >> > the >> >> > ArrayList causing performance to degrade significantly. >> >> > >> >> > ??? ??? int newCapacity = (oldCapacity * 3)/2 + 1; >> >> > ??? ??? ??? if (newCapacity < minCapacity) >> >> > ??? ??? newCapacity = minCapacity; >> >> > >> >> > Hashtable breaks entirely when the size of its backing array reaches >> >> > (1/2) * >> >> > Integer.MAX_VALUE and a rehash is necessary as is evident from the >> >> > following >> >> > excerpt from rehash.? Notice that rehash will attempt to create an >> >> > array >> >> > of >> >> > negative size if the size of the backing array reaches (1/2) * >> >> > Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves to >> >> > a >> >> > negative number. >> >> > >> >> > ??? int newCapacity = oldCapacity * 2 + 1; >> >> > ??? HashtableEntry newTable[] = new HashtableEntry[newCapacity]; >> >> > >> >> > When the capacity of the backing array in a ByteArrayOutputStream >> >> > reaches >> >> > (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write >> >> > operation is invoked, the capacity of the backing array is increased >> >> > only by >> >> > the required number of elements.? Notice that in the following >> >> > excerpt >> >> > from >> >> > ByteArrayOutputStream.write(int) the new backing array capacity is >> >> > set >> >> > to 2 >> >> > * buf.length unless this value would not suffice to accommodate the >> >> > required >> >> > capacity in which case it is set to the required capacity.? If the >> >> > current >> >> > backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1, >> >> > then >> >> > buf.length << 1 overflows and resolves to a negative number resulting >> >> > in >> >> > the >> >> > new capacity being set to the required capacity.? The major >> >> > consequence >> >> > of >> >> > this, like with ArrayList, is that each subsequent write operation >> >> > results >> >> > in a full resize of the ByteArrayOutputStream causing performance to >> >> > degrade >> >> > significantly. >> >> > >> >> > ??? int newcount = count + 1; >> >> > ??? if (newcount > buf.length) { >> >> > ??????????? buf = Arrays.copyOf(buf, Math.max(buf.length << 1, >> >> > newcount)); >> >> > ??? } >> >> > >> >> > It is interesting to note that any statements about the amortized >> >> > time >> >> > complexity of add/insert operations, such as the one in the ArrayList >> >> > javadoc, are invalidated by the performance related bugs.? One >> >> > solution >> >> > to >> >> > the above situations is to set the new capacity of the backing array >> >> > to >> >> > Integer.MAX_VALUE when the initial size calculation results in a >> >> > negative >> >> > number during a resize. >> >> > >> >> > Apologies if these bugs are already known. >> >> > >> >> > Regards, >> >> > >> >> > Kevin >> >> > >> > >> > > > From martinrb at google.com Thu Mar 11 01:59:31 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 10 Mar 2010 17:59:31 -0800 Subject: Progress of patches In-Reply-To: <4B97DDB3.2080107@gmx.de> References: <4B96E323.80505@gmx.de> <1ccfd1c11003091710u1e2ec8cevf64110ee3af2d88b@mail.gmail.com> <4B96F476.1060409@gmx.de> <4B97DDB3.2080107@gmx.de> Message-ID: <1ccfd1c11003101759g5f28ec2dhfd1a220ed6758880@mail.gmail.com> On Wed, Mar 10, 2010 at 09:58, Ulf Zibis wrote: > Hi Martin, > > there wasn't enough time today, so please wait for tomorrow. > > In brief: > - I wouldn't rename to isBMPCodePoint(), because there are many other names > in Surrogate class that don't sync to Character and and a usages search in > sun.nio.cs.* or where ever else could be omitted. Better add "// ?return > Character.isBMPCodePoint(uc);" as hint for the future. > - Thanks for mention me as contributor. > - Doesn't the bug description include the addition of isBMPCodePoint() to > class Character and the equivalent enhancement to isSupplementaryCodePoint() > ? Sorry, I should have included the fix to isSupplementaryCodePoint() in the last fix. Here's the next fix: http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/ 6666666: A better implementation of Character.isSupplementaryCodePoint Summary: Clever bit-twiddling saves a few bytes of machine code Reviewed-by: sherman Contributed-by: Ulf Zibis diff --git a/src/share/classes/java/lang/Character.java b/src/share/classes/java/lang/Character.java --- a/src/share/classes/java/lang/Character.java +++ b/src/share/classes/java/lang/Character.java @@ -2693,8 +2693,8 @@ * @since 1.5 */ public static boolean isSupplementaryCodePoint(int codePoint) { - return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT - && codePoint <= MAX_CODE_POINT; + int plane = codePoint >>> 16; + return plane != 0 && plane < ((MAX_CODE_POINT + 1) >>> 16); } /** From martinrb at google.com Thu Mar 11 04:42:09 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 10 Mar 2010 20:42:09 -0800 Subject: Progress of patches In-Reply-To: <1ccfd1c11003101759g5f28ec2dhfd1a220ed6758880@mail.gmail.com> References: <4B96E323.80505@gmx.de> <1ccfd1c11003091710u1e2ec8cevf64110ee3af2d88b@mail.gmail.com> <4B96F476.1060409@gmx.de> <4B97DDB3.2080107@gmx.de> <1ccfd1c11003101759g5f28ec2dhfd1a220ed6758880@mail.gmail.com> Message-ID: <1ccfd1c11003102042n192885b1ld3859b0f5311e732@mail.gmail.com> I couldn't resist making a similar change to isValidCodePoint. @@ -2678,7 +2678,8 @@ * @since 1.5 */ public static boolean isValidCodePoint(int codePoint) { - return codePoint >= MIN_CODE_POINT && codePoint <= MAX_CODE_POINT; + int plane = codePoint >>> 16; + return plane < ((MAX_CODE_POINT + 1) >>> 16); } /** This is a more important optimization, since isValidCodePoint almost always requires two compares, and this reduces it to one. (Still, none of these are really important, and no one will notice) http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/ Martin On Wed, Mar 10, 2010 at 17:59, Martin Buchholz wrote: > On Wed, Mar 10, 2010 at 09:58, Ulf Zibis wrote: >> Hi Martin, >> >> there wasn't enough time today, so please wait for tomorrow. >> >> In brief: >> - I wouldn't rename to isBMPCodePoint(), because there are many other names >> in Surrogate class that don't sync to Character and and a usages search in >> sun.nio.cs.* or where ever else could be omitted. Better add "// ?return >> Character.isBMPCodePoint(uc);" as hint for the future. >> - Thanks for mention me as contributor. >> - Doesn't the bug description include the addition of isBMPCodePoint() to >> class Character and the equivalent enhancement to isSupplementaryCodePoint() >> ? > > Sorry, I should have included the fix to isSupplementaryCodePoint() > in the last fix. > > Here's the next fix: > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/ > > 6666666: A better implementation of Character.isSupplementaryCodePoint > Summary: Clever bit-twiddling saves a few bytes of machine code > Reviewed-by: sherman > Contributed-by: Ulf Zibis > diff --git a/src/share/classes/java/lang/Character.java > b/src/share/classes/java/lang/Character.java > --- a/src/share/classes/java/lang/Character.java > +++ b/src/share/classes/java/lang/Character.java > @@ -2693,8 +2693,8 @@ > ? ? ?* @since ?1.5 > ? ? ?*/ > ? ? public static boolean isSupplementaryCodePoint(int codePoint) { > - ? ? ? ?return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT > - ? ? ? ? ? ?&& codePoint <= MAX_CODE_POINT; > + ? ? ? ?int plane = codePoint >>> 16; > + ? ? ? ?return plane != 0 && plane < ((MAX_CODE_POINT + 1) >>> 16); > ? ? } > > ? ? /** > From Xueming.Shen at Sun.COM Thu Mar 11 06:47:44 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Wed, 10 Mar 2010 22:47:44 -0800 Subject: Codereview needed for #6929479 In-Reply-To: <4B86D763.8080603@sun.com> References: <4B86CAE8.3080008@sun.com> <4B86D43D.4000002@sun.com> <4B86D6C4.4080600@sun.com> <4B86D763.8080603@sun.com> Message-ID: <4B989210.4020303@sun.com> Alan, webrev has been updated to use the sun.zip.disableMemoryMapping http://cr.openjdk.java.net/~sherman/6929479/webrev Please review. Thanks, Sherman Alan Bateman wrote: > Xueming Shen wrote: >> : >> The webrev has been updated to use "sun.zip.disableMmapping", I guess >> you meant "sun.zip.disableMmapping", right? > It's hard to find a good name. I was suggesting disableMapping (no > double m) but disableMemoryMapping could work too. > > -Alan. From Alan.Bateman at Sun.COM Thu Mar 11 15:34:43 2010 From: Alan.Bateman at Sun.COM (Alan Bateman) Date: Thu, 11 Mar 2010 15:34:43 +0000 Subject: Codereview needed for #6929479 In-Reply-To: <4B989210.4020303@sun.com> References: <4B86CAE8.3080008@sun.com> <4B86D43D.4000002@sun.com> <4B86D6C4.4080600@sun.com> <4B86D763.8080603@sun.com> <4B989210.4020303@sun.com> Message-ID: <4B990D93.9050806@sun.com> Xueming Shen wrote: > Alan, > > webrev has been updated to use the sun.zip.disableMemoryMapping > > http://cr.openjdk.java.net/~sherman/6929479/webrev > > Please review. > > Thanks, > Sherman I agree it's a better name. In ZipFile it would be good to put a comment at the initialization so that the reader understands what this property is about. Minor nit in zip_util.c at L805 is that it looks like the indenting it out by one. In any case, this will be a useful debugging option for the next time that someone steps on their own feet. -Alan From christopher.hegarty at sun.com Thu Mar 11 16:20:14 2010 From: christopher.hegarty at sun.com (christopher.hegarty at sun.com) Date: Thu, 11 Mar 2010 16:20:14 +0000 Subject: hg: jdk7/tl/jdk: 6934054: java/net/Socket/FDClose.java return error in samevm Message-ID: <20100311162054.0273244735@hg.openjdk.java.net> Changeset: 07e1c5a90c6a Author: chegar Date: 2010-03-11 16:17 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/07e1c5a90c6a 6934054: java/net/Socket/FDClose.java return error in samevm Summary: test is no longer useful Reviewed-by: alanb ! test/ProblemList.txt - test/java/net/Socket/FDClose.java From christopher.hegarty at sun.com Thu Mar 11 17:39:36 2010 From: christopher.hegarty at sun.com (christopher.hegarty at sun.com) Date: Thu, 11 Mar 2010 17:39:36 +0000 Subject: hg: jdk7/tl/jdk: 6933629: java/net/HttpURLConnection/HttpResponseCode.java fails if run in samevm mode Message-ID: <20100311173955.C60B444747@hg.openjdk.java.net> Changeset: c342735a3e58 Author: chegar Date: 2010-03-11 17:37 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/c342735a3e58 6933629: java/net/HttpURLConnection/HttpResponseCode.java fails if run in samevm mode Reviewed-by: alanb ! test/ProblemList.txt ! test/java/net/CookieHandler/CookieHandlerTest.java From christopher.hegarty at sun.com Thu Mar 11 17:51:09 2010 From: christopher.hegarty at sun.com (christopher.hegarty at sun.com) Date: Thu, 11 Mar 2010 17:51:09 +0000 Subject: hg: jdk7/tl/jdk: 6223635: Code hangs at connect call even when Timeout is specified when using a socks proxy Message-ID: <20100311175127.DA7B84474C@hg.openjdk.java.net> Changeset: c6f8c58ed51a Author: chegar Date: 2010-03-11 17:50 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/c6f8c58ed51a 6223635: Code hangs at connect call even when Timeout is specified when using a socks proxy Reviewed-by: michaelm, chegar Contributed-by: damjan.jov at gmail.com ! src/share/classes/java/net/SocketInputStream.java ! src/share/classes/java/net/SocksSocketImpl.java + test/java/net/Socket/SocksConnectTimeout.java From Ulf.Zibis at gmx.de Thu Mar 11 17:53:43 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 11 Mar 2010 18:53:43 +0100 Subject: Progress of patches In-Reply-To: <1ccfd1c11003102042n192885b1ld3859b0f5311e732@mail.gmail.com> References: <4B96E323.80505@gmx.de> <1ccfd1c11003091710u1e2ec8cevf64110ee3af2d88b@mail.gmail.com> <4B96F476.1060409@gmx.de> <4B97DDB3.2080107@gmx.de> <1ccfd1c11003101759g5f28ec2dhfd1a220ed6758880@mail.gmail.com> <1ccfd1c11003102042n192885b1ld3859b0f5311e732@mail.gmail.com> Message-ID: <4B992E27.9000205@gmx.de> I couldn't resist too ;-) . See: https://bugs.openjdk.java.net/attachment.cgi?id=178&action=diff Download: https://bugs.openjdk.java.net/attachment.cgi?id=178 Please have in mind: - the performance advantage as pair only occurs, if isBMPCodePoint too uses logical shift '>>>'. - String(int[] codePoints, int offset, int count) would have to access sun.nio.cs.Surrogate, if isBMPCodePoint doesn't exist in Character. Hopefully you can agree all my changes, -Ulf Am 11.03.2010 05:42, schrieb Martin Buchholz: > I couldn't resist making a similar change to isValidCodePoint. > > @@ -2678,7 +2678,8 @@ > * @since 1.5 > */ > public static boolean isValidCodePoint(int codePoint) { > - return codePoint>= MIN_CODE_POINT&& codePoint<= MAX_CODE_POINT; > + int plane = codePoint>>> 16; > + return plane< ((MAX_CODE_POINT + 1)>>> 16); > } > > /** > > This is a more important optimization, since isValidCodePoint > almost always requires two compares, and this reduces it to one. > (Still, none of these are really important, and no one will notice) > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/ > > Martin > > On Wed, Mar 10, 2010 at 17:59, Martin Buchholz wrote: > >> On Wed, Mar 10, 2010 at 09:58, Ulf Zibis wrote: >> >>> Hi Martin, >>> >>> there wasn't enough time today, so please wait for tomorrow. >>> >>> In brief: >>> - I wouldn't rename to isBMPCodePoint(), because there are many other names >>> in Surrogate class that don't sync to Character and and a usages search in >>> sun.nio.cs.* or where ever else could be omitted. Better add "// return >>> Character.isBMPCodePoint(uc);" as hint for the future. >>> - Thanks for mention me as contributor. >>> - Doesn't the bug description include the addition of isBMPCodePoint() to >>> class Character and the equivalent enhancement to isSupplementaryCodePoint() >>> ? >>> >> Sorry, I should have included the fix to isSupplementaryCodePoint() >> in the last fix. >> >> Here's the next fix: >> >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/ >> >> 6666666: A better implementation of Character.isSupplementaryCodePoint >> Summary: Clever bit-twiddling saves a few bytes of machine code >> Reviewed-by: sherman >> Contributed-by: Ulf Zibis >> diff --git a/src/share/classes/java/lang/Character.java >> b/src/share/classes/java/lang/Character.java >> --- a/src/share/classes/java/lang/Character.java >> +++ b/src/share/classes/java/lang/Character.java >> @@ -2693,8 +2693,8 @@ >> * @since 1.5 >> */ >> public static boolean isSupplementaryCodePoint(int codePoint) { >> - return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >> -&& codePoint<= MAX_CODE_POINT; >> + int plane = codePoint>>> 16; >> + return plane != 0&& plane< ((MAX_CODE_POINT + 1)>>> 16); >> } >> >> /** >> >> > > From Ulf.Zibis at gmx.de Thu Mar 11 18:25:01 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 11 Mar 2010 19:25:01 +0100 Subject: Progress of patches In-Reply-To: <1ccfd1c11003102042n192885b1ld3859b0f5311e732@mail.gmail.com> References: <4B96E323.80505@gmx.de> <1ccfd1c11003091710u1e2ec8cevf64110ee3af2d88b@mail.gmail.com> <4B96F476.1060409@gmx.de> <4B97DDB3.2080107@gmx.de> <1ccfd1c11003101759g5f28ec2dhfd1a220ed6758880@mail.gmail.com> <1ccfd1c11003102042n192885b1ld3859b0f5311e732@mail.gmail.com> Message-ID: <4B99357D.5090501@gmx.de> Am 11.03.2010 05:42, schrieb Martin Buchholz: > I couldn't resist making a similar change to isValidCodePoint. > > @@ -2678,7 +2678,8 @@ > * @since 1.5 > */ > public static boolean isValidCodePoint(int codePoint) { > - return codePoint>= MIN_CODE_POINT&& codePoint<= MAX_CODE_POINT; > + int plane = codePoint>>> 16; > + return plane< ((MAX_CODE_POINT + 1)>>> 16); > } > > /** > > This is a more important optimization, since isValidCodePoint > almost always requires two compares, and this reduces it to one. > Why isn't this true for isSupplementaryCodePoint() too ? Particularly there the "cheap" compare against 0 can't be benefited. > (Still, none of these are really important, and no one will notice) > Maybe in String(int[] codePoints, int offset, int count) or in numerous sun.nio.cs charset coders which use these methods consecutive in loop on each unicode character. -Ulf From Ulf.Zibis at gmx.de Thu Mar 11 18:32:30 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 11 Mar 2010 19:32:30 +0100 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <4B97E3BD.2000901@sun.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> Message-ID: <4B99373E.40502@gmx.de> Sherman, I know, your time ... ... but maybe someone is needed for sponsor here: https://bugs.openjdk.java.net/show_bug.cgi?id=100132 Could you do this? Much thanks, -Ulf Am 10.03.2010 19:23, schrieb Xueming Shen: > approved. > > I don't have a spare ws right now.so please just push, it's almost > there:-) > > sherman > > Martin Buchholz wrote: >> Here's the proposed fix for >> 6931812: A better implementation of sun.nio.cs.Surrogate.isBMP(int) >> >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint/ >> >> I changed the name to isBMPCodePoint in preparation for moving >> it to Character.java. >> (Sherman, perhaps you would like to take on that followon task?) >> >> Sherman, please approve. >> >> Martin >> >> On Sat, Mar 6, 2010 at 13:00, Ulf Zibis wrote: >>> Very fast Sherman, much thanks. >>> >>> Could you set the bug to accepted and evaluated, so my patch will >>> have a >>> chance to get into the code base? >>> >>> -Ulf >>> >>> >>> Am 03.03.2010 20:11, schrieb Xueming Shen: >>>> #6931812 >>>> >>>> Martin Buchholz wrote: >>>>> Sherman, would you like to file bugs for Ulf's improvements? >>>>> >>>>> On Wed, Mar 3, 2010 at 02:44, Ulf Zibis wrote: >>>>>> Am 03.03.2010 09:00, schrieb Martin Buchholz: >>>>>>> Keep in mind that supplementary characters are extremely rare. >>>>>>> >>>>>> Yes, but many API's in the JDK are used rarely. >>>>>> Why should they waste memory footprint / perform bad, >>>>>> particularly if it >>>>>> doesn't cost anything. >>>>> I admire your perfectionism. >>>>> >>>>>>> Therefore the existing implementation >>>>>>> >>>>>>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>>>>> && codePoint<= MAX_CODE_POINT; >>>>>>> >>>>>>> will almost always perform just one comparison against a constant, >>>>>>> which is hard to beat. >>>>>>> >>>>>> 1. Wondering: I think there are TWO comparisons. >>>>>> 2. Those comparisons need to load 32 bit values from machine code, >>>>>> against >>>>>> only 8 bit values in my case. >>>>> It's a good point. In the machine code, shifts are likely to use >>>>> immediate values, and so will be a small win. >>>>> >>>>> int x = codePoint >>> 16; >>>>> return x != 0 && x < 0x11; >>>>> >>>>> (On modern hardware, these optimizations >>>>> are less valuable than they used to be; >>>>> ordinary integer arithmetic is almost free) >>>>> >>>>> Martin > > From martinrb at google.com Thu Mar 11 19:38:54 2010 From: martinrb at google.com (Martin Buchholz) Date: Thu, 11 Mar 2010 11:38:54 -0800 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <4B99373E.40502@gmx.de> References: <4A95079A.8080803@gmx.de> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> Message-ID: <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> Ulf, your changes would be easier to get in if they were organized as mq patch files that could be qimported into an existing mq repo. I've done that below, which includes a subset of your own proposed changes: http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/ http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/ http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/ http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/ Sherman (or Alan), please review and/or file bugs for the above changes. isBMPCodePoint is a spec addition, requiring additional paperwork. Sherman, you owe me a response to my now-moldy proposed changes to the UTF-8 charset. The only controversial change would be the change in behavior in malformed-utf8, which I can take out. Martin On Thu, Mar 11, 2010 at 10:32, Ulf Zibis wrote: > Sherman, > > I know, your time ... > > ... but maybe someone is needed for sponsor here: > https://bugs.openjdk.java.net/show_bug.cgi?id=100132 > > Could you do this? > > Much thanks, > > -Ulf > > > Am 10.03.2010 19:23, schrieb Xueming Shen: >> >> approved. >> >> I don't have a spare ws right now.so please just push, it's almost >> there:-) >> >> sherman >> >> Martin Buchholz wrote: >>> >>> Here's the proposed fix for >>> 6931812: A better implementation of sun.nio.cs.Surrogate.isBMP(int) >>> >>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint/ >>> >>> I changed the name to isBMPCodePoint in preparation for moving >>> it to Character.java. >>> (Sherman, perhaps you would like to take on that followon task?) >>> >>> Sherman, please approve. >>> >>> Martin >>> >>> On Sat, Mar 6, 2010 at 13:00, Ulf Zibis wrote: >>>> >>>> Very fast Sherman, much thanks. >>>> >>>> Could you set the bug to accepted and evaluated, so my patch will have a >>>> chance to get into the code base? >>>> >>>> -Ulf >>>> >>>> >>>> Am 03.03.2010 20:11, schrieb Xueming Shen: >>>>> >>>>> #6931812 >>>>> >>>>> Martin Buchholz wrote: >>>>>> >>>>>> Sherman, would you like to file bugs for Ulf's improvements? >>>>>> >>>>>> On Wed, Mar 3, 2010 at 02:44, Ulf Zibis wrote: >>>>>>> >>>>>>> Am 03.03.2010 09:00, schrieb Martin Buchholz: >>>>>>>> >>>>>>>> Keep in mind that supplementary characters are extremely rare. >>>>>>>> >>>>>>> Yes, but many API's in the JDK are used rarely. >>>>>>> Why should they waste memory footprint / perform bad, particularly if >>>>>>> it >>>>>>> doesn't cost anything. >>>>>> >>>>>> I admire your perfectionism. >>>>>> >>>>>>>> Therefore the existing implementation >>>>>>>> >>>>>>>> ?return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>>>>>> && ?codePoint<= MAX_CODE_POINT; >>>>>>>> >>>>>>>> will almost always perform just one comparison against a constant, >>>>>>>> which is hard to beat. >>>>>>>> >>>>>>> 1. Wondering: I think there are TWO comparisons. >>>>>>> 2. Those comparisons need to load 32 bit values from machine code, >>>>>>> against >>>>>>> only 8 bit values in my case. >>>>>> >>>>>> It's a good point. ?In the machine code, shifts are likely to use >>>>>> immediate values, and so will be a small win. >>>>>> >>>>>> int x = codePoint >>> 16; >>>>>> return x != 0 && x < 0x11; >>>>>> >>>>>> (On modern hardware, these optimizations >>>>>> are less valuable than they used to be; >>>>>> ordinary integer arithmetic is almost free) >>>>>> >>>>>> Martin >> >> > > From martinrb at google.com Thu Mar 11 19:56:05 2010 From: martinrb at google.com (Martin Buchholz) Date: Thu, 11 Mar 2010 11:56:05 -0800 Subject: Progress of patches In-Reply-To: <4B99357D.5090501@gmx.de> References: <4B96E323.80505@gmx.de> <1ccfd1c11003091710u1e2ec8cevf64110ee3af2d88b@mail.gmail.com> <4B96F476.1060409@gmx.de> <4B97DDB3.2080107@gmx.de> <1ccfd1c11003101759g5f28ec2dhfd1a220ed6758880@mail.gmail.com> <1ccfd1c11003102042n192885b1ld3859b0f5311e732@mail.gmail.com> <4B99357D.5090501@gmx.de> Message-ID: <1ccfd1c11003111156u40854e62ybba922383d636cb9@mail.gmail.com> On Thu, Mar 11, 2010 at 10:25, Ulf Zibis wrote: > Am 11.03.2010 05:42, schrieb Martin Buchholz: >> >> I couldn't resist making a similar change to isValidCodePoint. >> >> @@ -2678,7 +2678,8 @@ >> ? ? ? * @since ?1.5 >> ? ? ? */ >> ? ? ?public static boolean isValidCodePoint(int codePoint) { >> - ? ? ? ?return codePoint>= MIN_CODE_POINT&& ?codePoint<= MAX_CODE_POINT; >> + ? ? ? ?int plane = codePoint>>> ?16; >> + ? ? ? ?return plane< ?((MAX_CODE_POINT + 1)>>> ?16); >> ? ? ?} >> >> ? ? ?/** >> >> This is a more important optimization, since isValidCodePoint >> almost always requires two compares, and this reduces it to one. >> > > Why isn't this true for isSupplementaryCodePoint() too ? > Particularly there the "cheap" compare against 0 can't be benefited. Because almost all code points are actually BMP, the naive implementation of isValidCodePoint will almost always require one more branch than isSupplementaryCodePoint, making it more valuable to optimize. >> (Still, none of these are really important, and no one will notice) Martin From Ulf.Zibis at gmx.de Thu Mar 11 20:19:33 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 11 Mar 2010 21:19:33 +0100 Subject: Progress of patches In-Reply-To: <1ccfd1c11003111156u40854e62ybba922383d636cb9@mail.gmail.com> References: <4B96E323.80505@gmx.de> <1ccfd1c11003091710u1e2ec8cevf64110ee3af2d88b@mail.gmail.com> <4B96F476.1060409@gmx.de> <4B97DDB3.2080107@gmx.de> <1ccfd1c11003101759g5f28ec2dhfd1a220ed6758880@mail.gmail.com> <1ccfd1c11003102042n192885b1ld3859b0f5311e732@mail.gmail.com> <4B99357D.5090501@gmx.de> <1ccfd1c11003111156u40854e62ybba922383d636cb9@mail.gmail.com> Message-ID: <4B995055.60700@gmx.de> Am 11.03.2010 20:56, schrieb Martin Buchholz: > >> Why isn't this true for isSupplementaryCodePoint() too ? >> Particularly there the "cheap" compare against 0 can't be benefited. >> > Because almost all code points are actually BMP, > the naive implementation of isValidCodePoint will > almost always require one more branch than > isSupplementaryCodePoint, > making it more valuable to optimize. Thanks, now it's clear what you meant. My disadvantage not being so familiar with english here in Germany. -Ulf From Ulf.Zibis at gmx.de Thu Mar 11 21:14:10 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 11 Mar 2010 22:14:10 +0100 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> Message-ID: <4B995D22.2020507@gmx.de> Am 11.03.2010 20:38, schrieb Martin Buchholz: > Ulf, your changes would be easier to get in > if they were organized as mq patch files that > could be qimported into an existing mq repo. > To be honest, I never heard about mq. Can you point me to some docs please? > I've done that below, which includes a subset of > your own proposed changes: > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/ > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/ > - Maybe better: "... using a single {@code char}". - Why don't you like using the new isBMPCodePoint() for isSupplementaryCodePoint() and toUpperCaseCharArray() ? - Same shift magic would enhance isISOControl(), isHighSurrogate(), isLowSurrogate(), in particular if latter occur consecutive. 8-bit shift + compare would allow HotSpot to compile to smart 1-byte immediate op-codes. - Don't you think my notes on validity are worth to add. (or separate bug ?) - Changing ch <= MAX_SURROGATE to ch < MAX_SURROGATE + 1 would allow HotSpot compiler to optimize 1 branch if those methods are used consecutive. - And at last, I would like to make the constants complete (= adding MAX_SUPPLEMENTARY_CODE_POINT). > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/ > Remembers me that some months ago I prepared a beautified version of Character's source (things like above, replacing against {@code}, indentation inconsistencies etc.) Would there be interest to provide such a patch ? > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/ > In encodeBufferLoop() you could use putChar(), putInt() instead put(). Should perform better. > Sherman (or Alan), > > please review and/or file bugs for the above changes. > > isBMPCodePoint is a spec addition, requiring additional paperwork. > > Sherman, you owe me a response to my now-moldy proposed changes to > the UTF-8 charset. > > The only controversial change would be the change in behavior in > malformed-utf8, which I can take out. > This remembers me at some thoughts. To be *exact* I think malformed should be returned for all codes, which are invalid in the regarding character set. So first validate for unmappable and second for invalid (=malformed). Doesn't cost any performance in looping mappable and valid characters, but little more effort after the loop is interrupted to form the right CoderResult. -Ulf From Xueming.Shen at Sun.COM Thu Mar 11 21:24:44 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Thu, 11 Mar 2010 13:24:44 -0800 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> Message-ID: <4B995F9C.3070705@sun.com> Martin, Ulf Following bug/rfs have been filed. 6934265 Add public method Character.isBMPCodePoint 6934268 Better implementation of Character.isValidCodePoint and isSupplementaryCodePoint() 6934270: Remove javac warnings from Character.java 6934271: Better handling of longer utf-8 sequences Masayoshi, Alan would you please help review the corresponding CCC for 6934265 at http://ccc.sfbay.sun.com/6934265 Martin, don't touch the utf-8 malformed issue for now, and incompatible change in UTF-8 is A issue. sherman Martin Buchholz wrote: > Ulf, your changes would be easier to get in > if they were organized as mq patch files that > could be qimported into an existing mq repo. > > I've done that below, which includes a subset of > your own proposed changes: > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/ > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/ > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/ > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/ > > Sherman (or Alan), > > please review and/or file bugs for the above changes. > > isBMPCodePoint is a spec addition, requiring additional paperwork. > > Sherman, you owe me a response to my now-moldy proposed changes to > the UTF-8 charset. > > The only controversial change would be the change in behavior in > malformed-utf8, which I can take out. > > Martin > > On Thu, Mar 11, 2010 at 10:32, Ulf Zibis wrote: > >> Sherman, >> >> I know, your time ... >> >> ... but maybe someone is needed for sponsor here: >> https://bugs.openjdk.java.net/show_bug.cgi?id=100132 >> >> Could you do this? >> >> Much thanks, >> >> -Ulf >> >> >> Am 10.03.2010 19:23, schrieb Xueming Shen: >> >>> approved. >>> >>> I don't have a spare ws right now.so please just push, it's almost >>> there:-) >>> >>> sherman >>> >>> Martin Buchholz wrote: >>> >>>> Here's the proposed fix for >>>> 6931812: A better implementation of sun.nio.cs.Surrogate.isBMP(int) >>>> >>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint/ >>>> >>>> I changed the name to isBMPCodePoint in preparation for moving >>>> it to Character.java. >>>> (Sherman, perhaps you would like to take on that followon task?) >>>> >>>> Sherman, please approve. >>>> >>>> Martin >>>> >>>> On Sat, Mar 6, 2010 at 13:00, Ulf Zibis wrote: >>>> >>>>> Very fast Sherman, much thanks. >>>>> >>>>> Could you set the bug to accepted and evaluated, so my patch will have a >>>>> chance to get into the code base? >>>>> >>>>> -Ulf >>>>> >>>>> >>>>> Am 03.03.2010 20:11, schrieb Xueming Shen: >>>>> >>>>>> #6931812 >>>>>> >>>>>> Martin Buchholz wrote: >>>>>> >>>>>>> Sherman, would you like to file bugs for Ulf's improvements? >>>>>>> >>>>>>> On Wed, Mar 3, 2010 at 02:44, Ulf Zibis wrote: >>>>>>> >>>>>>>> Am 03.03.2010 09:00, schrieb Martin Buchholz: >>>>>>>> >>>>>>>>> Keep in mind that supplementary characters are extremely rare. >>>>>>>>> >>>>>>>>> >>>>>>>> Yes, but many API's in the JDK are used rarely. >>>>>>>> Why should they waste memory footprint / perform bad, particularly if >>>>>>>> it >>>>>>>> doesn't cost anything. >>>>>>>> >>>>>>> I admire your perfectionism. >>>>>>> >>>>>>> >>>>>>>>> Therefore the existing implementation >>>>>>>>> >>>>>>>>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>>>>>>> && codePoint<= MAX_CODE_POINT; >>>>>>>>> >>>>>>>>> will almost always perform just one comparison against a constant, >>>>>>>>> which is hard to beat. >>>>>>>>> >>>>>>>>> >>>>>>>> 1. Wondering: I think there are TWO comparisons. >>>>>>>> 2. Those comparisons need to load 32 bit values from machine code, >>>>>>>> against >>>>>>>> only 8 bit values in my case. >>>>>>>> >>>>>>> It's a good point. In the machine code, shifts are likely to use >>>>>>> immediate values, and so will be a small win. >>>>>>> >>>>>>> int x = codePoint >>> 16; >>>>>>> return x != 0 && x < 0x11; >>>>>>> >>>>>>> (On modern hardware, these optimizations >>>>>>> are less valuable than they used to be; >>>>>>> ordinary integer arithmetic is almost free) >>>>>>> >>>>>>> Martin >>>>>>> >>> >> From Xueming.Shen at Sun.COM Thu Mar 11 21:45:37 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Thu, 11 Mar 2010 13:45:37 -0800 Subject: Codereview needed for #6929479 In-Reply-To: <4B990D93.9050806@sun.com> References: <4B86CAE8.3080008@sun.com> <4B86D43D.4000002@sun.com> <4B86D6C4.4080600@sun.com> <4B86D763.8080603@sun.com> <4B989210.4020303@sun.com> <4B990D93.9050806@sun.com> Message-ID: <4B996481.7000102@sun.com> Alan Bateman wrote: > Xueming Shen wrote: >> Alan, >> >> webrev has been updated to use the sun.zip.disableMemoryMapping >> >> http://cr.openjdk.java.net/~sherman/6929479/webrev >> >> Please review. >> >> Thanks, >> Sherman > I agree it's a better name. In ZipFile it would be good to put a > comment at the initialization so that the reader understands what this > property is about. Minor nit in zip_util.c at L805 is that it looks > like the indenting it out by one. In any case, this will be a useful > debugging option for the next time that someone steps on their own feet. > > -Alan > Thanks Alan. One comment has been added into ZipFile. I counted the space in zip_util.c/L805:-) it appears the indenting is correct. maybe because of font setting? Sherman From xueming.shen at sun.com Thu Mar 11 22:13:54 2010 From: xueming.shen at sun.com (xueming.shen at sun.com) Date: Thu, 11 Mar 2010 22:13:54 +0000 Subject: hg: jdk7/tl/jdk: 6929479: Add a system property sun.zip.disableMemoryMapping to disable mmap use in ZipFile Message-ID: <20100311221413.1ACC14478B@hg.openjdk.java.net> Changeset: ee385b4e2ffb Author: sherman Date: 2010-03-11 14:06 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/ee385b4e2ffb 6929479: Add a system property sun.zip.disableMemoryMapping to disable mmap use in ZipFile Summary: system property sun.zip.disableMemoryMapping to disable mmap use Reviewed-by: alanb ! src/share/classes/java/util/zip/ZipFile.java ! src/share/native/java/util/zip/ZipFile.c ! src/share/native/java/util/zip/zip_util.c ! src/share/native/java/util/zip/zip_util.h From martinrb at google.com Fri Mar 12 01:46:37 2010 From: martinrb at google.com (Martin Buchholz) Date: Thu, 11 Mar 2010 17:46:37 -0800 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <4B995D22.2020507@gmx.de> References: <4A95079A.8080803@gmx.de> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> Message-ID: <1ccfd1c11003111746o5dcd81d1veadb0c6a4882df65@mail.gmail.com> On Thu, Mar 11, 2010 at 13:14, Ulf Zibis wrote: > Am 11.03.2010 20:38, schrieb Martin Buchholz: >> >> Ulf, your changes would be easier to get in >> if they were organized as mq patch files that >> could be qimported into an existing mq repo. >> > > To be honest, I never heard about mq. Can you point me to some docs please? http://mercurial.selenic.com/wiki/MqExtension http://hgbook.red-bean.com/read/managing-change-with-mercurial-queues.html >> I've done that below, which includes a subset of >> your own proposed changes: >> >> >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/ >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/ >> > > - Maybe better: ?"... using a single {@code char}". > - Why don't you like using the new isBMPCodePoint() for > isSupplementaryCodePoint() and toUpperCaseCharArray() ? > - Same shift magic would enhance isISOControl(), I propose the following small improvement on your own version of isISOControl: public static boolean isISOControl(int codePoint) { // Optimized form of: // (codePoint >= 0x0000 && codePoint <= 0x001F) || // (codePoint >= 0x007F && codePoint <= 0x009F); return codePoint <= 0x009F && (codePoint >= 0x007F || (codePoint >>> 5 == 0)); } Because non-ASCII chars get away with only one comparison. isHighSurrogate(), > isLowSurrogate(), in particular if latter occur consecutive. > ?8-bit shift + compare would allow HotSpot to compile to smart 1-byte > immediate op-codes. Alright, you've talked me into it, I can't resist your love of micro-optimizations. More later. Martin From Kelly.Ohair at Sun.COM Fri Mar 12 04:59:09 2010 From: Kelly.Ohair at Sun.COM (Kelly O'Hair) Date: Thu, 11 Mar 2010 20:59:09 -0800 Subject: TEST: java/nio/channels/AsynchronousSocketChannel/Basic.java Message-ID: I'm having problems with this test on Solaris 10 X86 and Fedora 9 32bit. Ring any bells? -kto -------------------------------------------------- TEST: java/nio/channels/AsynchronousSocketChannel/Basic.java JDK under test: (/tmp/jprt/P1/T/173102.ohair/testproduct/ solaris_i586_5.10-product) openjdk version "1.7.0-2010-03-11-173102.ohair.jdk" OpenJDK Runtime Environment (build 1.7.0-2010-03-11-173102.ohair.jdk- jprtadm_2010_03_11_09_40-b00) Java HotSpot(TM) Server VM (build 17.0-b10, mixed mode) ACTION: build -- Passed. Build successful REASON: Named class compiled on demand TIME: 0.89 seconds messages: command: build Basic reason: Named class compiled on demand elapsed time (seconds): 0.89 ACTION: compile -- Passed. Compilation successful REASON: .class file out of date or does not exist TIME: 0.89 seconds messages: command: compile /tmp/jprt/P1/T/173102.ohair/source/test/java/nio/ channels/AsynchronousSocketChannel/Basic.java reason: .class file out of date or does not exist elapsed time (seconds): 0.89 STDOUT: STDERR: ACTION: main -- Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Should not connect REASON: User specified action: run main/timeout=600 Basic TIME: 4.007 seconds messages: command: main Basic reason: User specified action: run main/timeout=600 Basic elapsed time (seconds): 4.007 STDOUT: -- bind -- -- socket options -- -- connect -- -- connect to non-existent host -- -- asynchronous close when connecting -- STDERR: java.lang.RuntimeException: Should not connect at Basic.testCloseWhenPending(Basic.java:238) at Basic.main(Basic.java:46) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57) at sun .reflect .DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 43) at java.lang.reflect.Method.invoke(Method.java:613) at com.sun.javatest.regtest.MainWrapper $MainThread.run(MainWrapper.java:94) at java.lang.Thread.run(Thread.java:717) JavaTest Message: Test threw exception: java.lang.RuntimeException: Should not connect JavaTest Message: shutting down test STATUS:Failed.`main' threw exception: java.lang.RuntimeException: Should not connect TEST RESULT: Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Should not connect -------------------------------------------------- -------------------------------------------------- TEST: java/nio/channels/AsynchronousSocketChannel/Basic.java JDK under test: (/tmp/jprt/P1/T/173102.ohair/testproduct/ linux_i586_2.6-product) openjdk version "1.7.0-2010-03-11-173102.ohair.jdk" OpenJDK Runtime Environment (build 1.7.0-2010-03-11-173102.ohair.jdk- jprtadm_2010_03_11_09_41-b00) Java HotSpot(TM) Server VM (build 17.0-b10, mixed mode) ACTION: build -- Passed. Build successful REASON: Named class compiled on demand TIME: 0.85 seconds messages: command: build Basic reason: Named class compiled on demand elapsed time (seconds): 0.85 ACTION: compile -- Passed. Compilation successful REASON: .class file out of date or does not exist TIME: 0.85 seconds messages: command: compile /tmp/jprt/P1/T/173102.ohair/source/test/java/nio/ channels/AsynchronousSocketChannel/Basic.java reason: .class file out of date or does not exist elapsed time (seconds): 0.85 STDOUT: STDERR: ACTION: main -- Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Connection should not be established REASON: User specified action: run main/timeout=600 Basic TIME: 10.574 seconds messages: command: main Basic reason: User specified action: run main/timeout=600 Basic elapsed time (seconds): 10.574 STDOUT: -- bind -- -- socket options -- -- connect -- -- connect to non-existent host -- STDERR: java.lang.RuntimeException: Connection should not be established at Basic.testConnect(Basic.java:206) at Basic.main(Basic.java:45) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57) at sun .reflect .DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 43) at java.lang.reflect.Method.invoke(Method.java:613) at com.sun.javatest.regtest.MainWrapper $MainThread.run(MainWrapper.java:94) at java.lang.Thread.run(Thread.java:717) JavaTest Message: Test threw exception: java.lang.RuntimeException: Connection should not be established JavaTest Message: shutting down test STATUS:Failed.`main' threw exception: java.lang.RuntimeException: Connection should not be established TEST RESULT: Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Connection should not be established -------------------------------------------------- From Alan.Bateman at Sun.COM Fri Mar 12 08:06:55 2010 From: Alan.Bateman at Sun.COM (Alan Bateman) Date: Fri, 12 Mar 2010 08:06:55 +0000 Subject: TEST: java/nio/channels/AsynchronousSocketChannel/Basic.java In-Reply-To: References: Message-ID: <4B99F61F.6060705@sun.com> Kelly O'Hair wrote: > > I'm having problems with this test on Solaris 10 X86 and Fedora 9 32bit. > Ring any bells? > -kto I haven't seen this failure but looking at the test now, the connect can fail immediately which would cause both of the test failures - can you create a bug and I'll fix this. -Alan. From Ulf.Zibis at gmx.de Fri Mar 12 13:04:39 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Fri, 12 Mar 2010 14:04:39 +0100 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <1ccfd1c11003111746o5dcd81d1veadb0c6a4882df65@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> <1ccfd1c11003111746o5dcd81d1veadb0c6a4882df65@mail.gmail.com> Message-ID: <4B9A3BE7.4090502@gmx.de> Am 12.03.2010 02:46, schrieb Martin Buchholz: > http://mercurial.selenic.com/wiki/MqExtension > http://hgbook.red-bean.com/read/managing-change-with-mercurial-queues.html > Ah, I see mq is part of hg. Another thing to learn, but sounds good. Unfortunately there seems no GUI for it. I run TortoiseHG on my Windows. > > I propose the following small improvement on your own > version of isISOControl: > > public static boolean isISOControl(int codePoint) { > // Optimized form of: > // (codePoint>= 0x0000&& codePoint<= 0x001F) || > // (codePoint>= 0x007F&& codePoint<= 0x009F); > return codePoint<= 0x009F&& > (codePoint>= 0x007F || (codePoint>>> 5 == 0)); > } > > Because non-ASCII chars get away with only one comparison. > +1 thanks. Because here we are talking about ASCII values, I would prefer 2-digit values or complete code points e.g. 0x0000007F. > > Alright, you've talked me into it, > I can't resist your love of micro-optimizations. > Is that sentence correct english grammar? I'm afraid to misunderstand. By the way, I've filed some bugs against HotSpot to optimize those cases: 6932837 - Better use unsigned jump if one of the range limits is 0 6932855 - Save superfluous CMP instruction from while loop 6933327 - Use shifted addressing modes instead of shift instuctions 6933324 - Always inline methods, which have only 1 call site If they would be accepted and fixed, some of our twiddling would become superfluous, at least using c2, but maybe not for interpreter and c1. -Ulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ulf.Zibis at gmx.de Fri Mar 12 16:41:07 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Fri, 12 Mar 2010 17:41:07 +0100 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> Message-ID: <4B9A6EA3.7070603@gmx.de> Hi Martin, is that 6666666 a fake bug id? I still can't see it in the public bugparade: * This bug is not available. More information is available at -http://developers.sun.com/resources/bugsFAQ.html#s4q2 -Ulf Am 11.03.2010 20:38, schrieb Martin Buchholz: > Ulf, your changes would be easier to get in > if they were organized as mq patch files that > could be qimported into an existing mq repo. > > I've done that below, which includes a subset of > your own proposed changes: > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/ > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/ > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/ > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/ > > Sherman (or Alan), > > please review and/or file bugs for the above changes. > > isBMPCodePoint is a spec addition, requiring additional paperwork. > > Sherman, you owe me a response to my now-moldy proposed changes to > the UTF-8 charset. > > The only controversial change would be the change in behavior in > malformed-utf8, which I can take out. > > Martin > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kelly.Ohair at Sun.COM Fri Mar 12 17:15:33 2010 From: Kelly.Ohair at Sun.COM (Kelly O'Hair) Date: Fri, 12 Mar 2010 09:15:33 -0800 Subject: TEST: java/nio/channels/AsynchronousSocketChannel/Basic.java In-Reply-To: <4B99F61F.6060705@sun.com> References: <4B99F61F.6060705@sun.com> Message-ID: <599F4086-783E-4C05-9BEA-262758B32D35@Sun.COM> Filed bug 6934585 -kto On Mar 12, 2010, at 12:06 AM, Alan Bateman wrote: > Kelly O'Hair wrote: >> >> I'm having problems with this test on Solaris 10 X86 and Fedora 9 >> 32bit. >> Ring any bells? >> -kto > I haven't seen this failure but looking at the test now, the connect > can fail immediately which would cause both of the test failures - > can you create a bug and I'll fix this. > > -Alan. From jonathan.gibbons at sun.com Fri Mar 12 20:01:35 2010 From: jonathan.gibbons at sun.com (jonathan.gibbons at sun.com) Date: Fri, 12 Mar 2010 20:01:35 +0000 Subject: hg: jdk7/tl/langtools: 6934224: update langtools/test/Makefile Message-ID: <20100312200137.D4E54448C3@hg.openjdk.java.net> Changeset: f856c0942c06 Author: jjg Date: 2010-03-12 12:00 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/f856c0942c06 6934224: update langtools/test/Makefile Reviewed-by: ohair ! make/jprt.properties ! test/Makefile From kelly.ohair at sun.com Fri Mar 12 20:15:43 2010 From: kelly.ohair at sun.com (kelly.ohair at sun.com) Date: Fri, 12 Mar 2010 20:15:43 +0000 Subject: hg: jdk7/tl/jdk: 2 new changesets Message-ID: <20100312201621.2443B448C8@hg.openjdk.java.net> Changeset: bf6eb240e718 Author: ohair Date: 2010-03-12 09:03 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/bf6eb240e718 6933294: Fix some test/Makefile issues around Linux ARCH settings, better defaults Reviewed-by: jjg ! test/Makefile ! test/ProblemList.txt Changeset: cda90ceb7176 Author: ohair Date: 2010-03-12 09:06 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/cda90ceb7176 Merge ! test/ProblemList.txt From martinrb at google.com Fri Mar 12 23:04:06 2010 From: martinrb at google.com (Martin Buchholz) Date: Fri, 12 Mar 2010 15:04:06 -0800 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <4B995D22.2020507@gmx.de> References: <4A95079A.8080803@gmx.de> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> Message-ID: <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> On Thu, Mar 11, 2010 at 13:14, Ulf Zibis wrote: > Am 11.03.2010 20:38, schrieb Martin Buchholz: > - Maybe better: ?"... using a single {@code char}". Done. > - Why don't you like using the new isBMPCodePoint() for > isSupplementaryCodePoint() and toUpperCaseCharArray() ? I now use it for the assert in toUpperCaseCharArray() > - Same shift magic would enhance isISOControl(), isHighSurrogate(), > isLowSurrogate(), in particular if latter occur consecutive. isISOControl - yes, others - I am not convinced. > ?8-bit shift + compare would allow HotSpot to compile to smart 1-byte > immediate op-codes. > - Don't you think my notes on validity are worth to add. (or separate bug ?) I agree something could be done here - separate bug. > - Changing ch <= MAX_SURROGATE to ch < MAX_SURROGATE + 1 would allow HotSpot > compiler to optimize 1 branch if those methods are used consecutive. Done. > - And at last, I would like to make the constants complete (= adding > MAX_SUPPLEMENTARY_CODE_POINT). I have no objection to adding those, but I am not in favor either. You'll need to convince someone else. >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/ >> > > Remembers me that some months ago I prepared a beautified version of > Character's source (things like above, replacing against {@code}, > indentation inconsistencies etc.) Would there be interest to provide such a > patch ? Please provide URL of patch. >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/ > In encodeBufferLoop() you could use putChar(), putInt() instead put(). > Should perform better. I'm not convinced. You would need to assemble bytes into an int, and then break them apart into bytes on the other side? Martin From martinrb at google.com Fri Mar 12 23:13:25 2010 From: martinrb at google.com (Martin Buchholz) Date: Fri, 12 Mar 2010 15:13:25 -0800 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <4B9A3BE7.4090502@gmx.de> References: <4A95079A.8080803@gmx.de> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> <1ccfd1c11003111746o5dcd81d1veadb0c6a4882df65@mail.gmail.com> <4B9A3BE7.4090502@gmx.de> Message-ID: <1ccfd1c11003121513g5a5c4a43xc44a79da36975ab7@mail.gmail.com> On Fri, Mar 12, 2010 at 05:04, Ulf Zibis wrote: > Am 12.03.2010 02:46, schrieb Martin Buchholz: > public static boolean isISOControl(int codePoint) { > // Optimized form of: > // (codePoint >= 0x0000 && codePoint <= 0x001F) || > // (codePoint >= 0x007F && codePoint <= 0x009F); > return codePoint <= 0x009F && > (codePoint >= 0x007F || (codePoint >>> 5 == 0)); > } > > Because non-ASCII chars get away with only one comparison. > > > +1 thanks. > Because here we are talking about ASCII values, I would prefer 2-digit > values or complete code points e.g. 0x0000007F. Good idea. Done. > > Alright, you've talked me into it, > I can't resist your love of micro-optimizations. > > > Is that sentence correct english grammar? I'm afraid to misunderstand. , ==> . > By the way, I've filed some bugs against HotSpot to optimize those cases: > 6932837 - Better use unsigned jump if one of the range limits is 0 > 6932855 - Save superfluous CMP instruction from while loop > 6933327 - Use shifted addressing modes instead of shift instuctions > 6933324 - Always inline methods, which have only 1 call site > > If they would be accepted and fixed, some of our twiddling would become > superfluous, at least using c2, but maybe not for interpreter and c1. Of course, we often write hotspot-optimized code, but in general we should try to write "good" code that any VM could love. Martin From jonathan.gibbons at sun.com Fri Mar 12 23:25:00 2010 From: jonathan.gibbons at sun.com (jonathan.gibbons at sun.com) Date: Fri, 12 Mar 2010 23:25:00 +0000 Subject: hg: jdk7/tl: 6934712: run langtools jtreg tests from top level test/Makefile Message-ID: <20100312232500.DB883448F5@hg.openjdk.java.net> Changeset: bbd817429100 Author: jjg Date: 2010-03-12 15:22 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/rev/bbd817429100 6934712: run langtools jtreg tests from top level test/Makefile Reviewed-by: ohair ! test/Makefile From martinrb at google.com Fri Mar 12 23:29:53 2010 From: martinrb at google.com (Martin Buchholz) Date: Fri, 12 Mar 2010 15:29:53 -0800 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <4B995F9C.3070705@sun.com> References: <4A95079A.8080803@gmx.de> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995F9C.3070705@sun.com> Message-ID: <1ccfd1c11003121529r22651bfcnfca6435311d707a6@mail.gmail.com> OK, next round of review. I changed my UTF-8 changes to be behavior-preserving, removing any hint of controversy, and renamed the patch to "utf8-twiddling". I got Ulf in my head, and can't stop micro-optimizing. I added a new micro-optimizing patch for Bits.java. Please file a bug. 6934268: Better implementation of Character.isValidCodePoint and isSupplementaryCodePoint() http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint 6934265: Add public method Character.isBMPCodePoint http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint 6934270: Remove javac warnings from Character.java http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings 6934271: Better handling of longer utf-8 sequences http://cr.openjdk.java.net/~martin/webrevs/openjdk7/utf8-twiddling 6666666: Optimize bit-twiddling in Bits.java http://cr.openjdk.java.net/~martin/webrevs/openjdk7/qtip tip Bits.java Now I need to go off to my micro-optimizers-anonymous meeting. Martin On Thu, Mar 11, 2010 at 13:24, Xueming Shen wrote: > Martin, Ulf > > Following bug/rfs have been filed. > > 6934265 Add public method Character.isBMPCodePoint > 6934268 Better implementation of Character.isValidCodePoint and > isSupplementaryCodePoint() > 6934270: Remove javac warnings from Character.java > 6934271: Better handling of longer utf-8 sequences > > Masayoshi, Alan would you please help review the corresponding CCC for > 6934265 at > http://ccc.sfbay.sun.com/6934265 > > Martin, don't touch the utf-8 malformed issue for now, and incompatible > change in UTF-8 > is A issue. > > sherman > > Martin Buchholz wrote: >> >> Ulf, your changes would be easier to get in >> if they were organized as mq patch files that >> could be qimported into an existing mq repo. >> >> I've done that below, which includes a subset of >> your own proposed changes: >> >> >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/ >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/ >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/ >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/ >> >> Sherman (or Alan), >> >> please review and/or file bugs for the above changes. >> >> isBMPCodePoint is a spec addition, requiring additional paperwork. >> >> Sherman, you owe me a response to my now-moldy proposed changes to >> the UTF-8 charset. >> >> The only controversial change would be the change in behavior in >> malformed-utf8, which I can take out. >> >> Martin >> >> On Thu, Mar 11, 2010 at 10:32, Ulf Zibis wrote: >> >>> >>> Sherman, >>> >>> I know, your time ... >>> >>> ... but maybe someone is needed for sponsor here: >>> https://bugs.openjdk.java.net/show_bug.cgi?id=100132 >>> >>> Could you do this? >>> >>> Much thanks, >>> >>> -Ulf >>> >>> >>> Am 10.03.2010 19:23, schrieb Xueming Shen: >>> >>>> >>>> approved. >>>> >>>> I don't have a spare ws right now.so please just push, it's almost >>>> there:-) >>>> >>>> sherman >>>> >>>> Martin Buchholz wrote: >>>> >>>>> >>>>> Here's the proposed fix for >>>>> 6931812: A better implementation of sun.nio.cs.Surrogate.isBMP(int) >>>>> >>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint/ >>>>> >>>>> I changed the name to isBMPCodePoint in preparation for moving >>>>> it to Character.java. >>>>> (Sherman, perhaps you would like to take on that followon task?) >>>>> >>>>> Sherman, please approve. >>>>> >>>>> Martin >>>>> >>>>> On Sat, Mar 6, 2010 at 13:00, Ulf Zibis wrote: >>>>> >>>>>> >>>>>> Very fast Sherman, much thanks. >>>>>> >>>>>> Could you set the bug to accepted and evaluated, so my patch will have >>>>>> a >>>>>> chance to get into the code base? >>>>>> >>>>>> -Ulf >>>>>> >>>>>> >>>>>> Am 03.03.2010 20:11, schrieb Xueming Shen: >>>>>> >>>>>>> >>>>>>> #6931812 >>>>>>> >>>>>>> Martin Buchholz wrote: >>>>>>> >>>>>>>> >>>>>>>> Sherman, would you like to file bugs for Ulf's improvements? >>>>>>>> >>>>>>>> On Wed, Mar 3, 2010 at 02:44, Ulf Zibis wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Am 03.03.2010 09:00, schrieb Martin Buchholz: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Keep in mind that supplementary characters are extremely rare. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> Yes, but many API's in the JDK are used rarely. >>>>>>>>> Why should they waste memory footprint / perform bad, particularly >>>>>>>>> if >>>>>>>>> it >>>>>>>>> doesn't cost anything. >>>>>>>>> >>>>>>>> >>>>>>>> I admire your perfectionism. >>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>> Therefore the existing implementation >>>>>>>>>> >>>>>>>>>> ?return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>>>>>>>> && ?codePoint<= MAX_CODE_POINT; >>>>>>>>>> >>>>>>>>>> will almost always perform just one comparison against a constant, >>>>>>>>>> which is hard to beat. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> 1. Wondering: I think there are TWO comparisons. >>>>>>>>> 2. Those comparisons need to load 32 bit values from machine code, >>>>>>>>> against >>>>>>>>> only 8 bit values in my case. >>>>>>>>> >>>>>>>> >>>>>>>> It's a good point. ?In the machine code, shifts are likely to use >>>>>>>> immediate values, and so will be a small win. >>>>>>>> >>>>>>>> int x = codePoint >>> 16; >>>>>>>> return x != 0 && x < 0x11; >>>>>>>> >>>>>>>> (On modern hardware, these optimizations >>>>>>>> are less valuable than they used to be; >>>>>>>> ordinary integer arithmetic is almost free) >>>>>>>> >>>>>>>> Martin >>>>>>>> >>>> >>>> >>> >>> > > From kelly.ohair at sun.com Sat Mar 13 01:47:22 2010 From: kelly.ohair at sun.com (kelly.ohair at sun.com) Date: Sat, 13 Mar 2010 01:47:22 +0000 Subject: hg: jdk7/tl: 6934759: Add langtools testing to jprt control builds Message-ID: <20100313014722.B3FF544918@hg.openjdk.java.net> Changeset: c60ed0f6d91a Author: ohair Date: 2010-03-12 17:44 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/rev/c60ed0f6d91a 6934759: Add langtools testing to jprt control builds Reviewed-by: jjg ! make/jprt.properties From Xueming.Shen at Sun.COM Tue Mar 16 06:26:42 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Mon, 15 Mar 2010 22:26:42 -0800 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <1ccfd1c11003121529r22651bfcnfca6435311d707a6@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995F9C.3070705@sun.com> <1ccfd1c11003121529r22651bfcnfca6435311d707a6@mail.gmail.com> Message-ID: <4B9F24A2.2070300@sun.com> CR 6935172 Created, P4 java/classes_io Optimize bit-twiddling in Bits.java Can I assume the webrev is http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Bits.java -Sherman Martin Buchholz wrote: > OK, next round of review. > > I changed my UTF-8 changes to be behavior-preserving, > removing any hint of controversy, and renamed the patch > to "utf8-twiddling". > > I got Ulf in my head, and can't stop micro-optimizing. > I added a new micro-optimizing patch for Bits.java. > Please file a bug. > > 6934268: Better implementation of Character.isValidCodePoint and > isSupplementaryCodePoint() > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint > 6934265: Add public method Character.isBMPCodePoint > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint > 6934270: Remove javac warnings from Character.java > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings > 6934271: Better handling of longer utf-8 sequences > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/utf8-twiddling > 6666666: Optimize bit-twiddling in Bits.java > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/qtip tip Bits.java > > Now I need to go off to my micro-optimizers-anonymous meeting. > > Martin > > On Thu, Mar 11, 2010 at 13:24, Xueming Shen wrote: > >> Martin, Ulf >> >> Following bug/rfs have been filed. >> >> 6934265 Add public method Character.isBMPCodePoint >> 6934268 Better implementation of Character.isValidCodePoint and >> isSupplementaryCodePoint() >> 6934270: Remove javac warnings from Character.java >> 6934271: Better handling of longer utf-8 sequences >> >> Masayoshi, Alan would you please help review the corresponding CCC for >> 6934265 at >> http://ccc.sfbay.sun.com/6934265 >> >> Martin, don't touch the utf-8 malformed issue for now, and incompatible >> change in UTF-8 >> is A issue. >> >> sherman >> >> Martin Buchholz wrote: >> >>> Ulf, your changes would be easier to get in >>> if they were organized as mq patch files that >>> could be qimported into an existing mq repo. >>> >>> I've done that below, which includes a subset of >>> your own proposed changes: >>> >>> >>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/ >>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/ >>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/ >>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/ >>> >>> Sherman (or Alan), >>> >>> please review and/or file bugs for the above changes. >>> >>> isBMPCodePoint is a spec addition, requiring additional paperwork. >>> >>> Sherman, you owe me a response to my now-moldy proposed changes to >>> the UTF-8 charset. >>> >>> The only controversial change would be the change in behavior in >>> malformed-utf8, which I can take out. >>> >>> Martin >>> >>> On Thu, Mar 11, 2010 at 10:32, Ulf Zibis wrote: >>> >>> >>>> Sherman, >>>> >>>> I know, your time ... >>>> >>>> ... but maybe someone is needed for sponsor here: >>>> https://bugs.openjdk.java.net/show_bug.cgi?id=100132 >>>> >>>> Could you do this? >>>> >>>> Much thanks, >>>> >>>> -Ulf >>>> >>>> >>>> Am 10.03.2010 19:23, schrieb Xueming Shen: >>>> >>>> >>>>> approved. >>>>> >>>>> I don't have a spare ws right now.so please just push, it's almost >>>>> there:-) >>>>> >>>>> sherman >>>>> >>>>> Martin Buchholz wrote: >>>>> >>>>> >>>>>> Here's the proposed fix for >>>>>> 6931812: A better implementation of sun.nio.cs.Surrogate.isBMP(int) >>>>>> >>>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint/ >>>>>> >>>>>> I changed the name to isBMPCodePoint in preparation for moving >>>>>> it to Character.java. >>>>>> (Sherman, perhaps you would like to take on that followon task?) >>>>>> >>>>>> Sherman, please approve. >>>>>> >>>>>> Martin >>>>>> >>>>>> On Sat, Mar 6, 2010 at 13:00, Ulf Zibis wrote: >>>>>> >>>>>> >>>>>>> Very fast Sherman, much thanks. >>>>>>> >>>>>>> Could you set the bug to accepted and evaluated, so my patch will have >>>>>>> a >>>>>>> chance to get into the code base? >>>>>>> >>>>>>> -Ulf >>>>>>> >>>>>>> >>>>>>> Am 03.03.2010 20:11, schrieb Xueming Shen: >>>>>>> >>>>>>> >>>>>>>> #6931812 >>>>>>>> >>>>>>>> Martin Buchholz wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Sherman, would you like to file bugs for Ulf's improvements? >>>>>>>>> >>>>>>>>> On Wed, Mar 3, 2010 at 02:44, Ulf Zibis wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Am 03.03.2010 09:00, schrieb Martin Buchholz: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Keep in mind that supplementary characters are extremely rare. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Yes, but many API's in the JDK are used rarely. >>>>>>>>>> Why should they waste memory footprint / perform bad, particularly >>>>>>>>>> if >>>>>>>>>> it >>>>>>>>>> doesn't cost anything. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> I admire your perfectionism. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>> Therefore the existing implementation >>>>>>>>>>> >>>>>>>>>>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>>>>>>>>> && codePoint<= MAX_CODE_POINT; >>>>>>>>>>> >>>>>>>>>>> will almost always perform just one comparison against a constant, >>>>>>>>>>> which is hard to beat. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> 1. Wondering: I think there are TWO comparisons. >>>>>>>>>> 2. Those comparisons need to load 32 bit values from machine code, >>>>>>>>>> against >>>>>>>>>> only 8 bit values in my case. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> It's a good point. In the machine code, shifts are likely to use >>>>>>>>> immediate values, and so will be a small win. >>>>>>>>> >>>>>>>>> int x = codePoint >>> 16; >>>>>>>>> return x != 0 && x < 0x11; >>>>>>>>> >>>>>>>>> (On modern hardware, these optimizations >>>>>>>>> are less valuable than they used to be; >>>>>>>>> ordinary integer arithmetic is almost free) >>>>>>>>> >>>>>>>>> Martin >>>>>>>>> >>>>>>>>> >>>>> >>>> >> From christopher.hegarty at sun.com Tue Mar 16 10:06:44 2010 From: christopher.hegarty at sun.com (christopher.hegarty at sun.com) Date: Tue, 16 Mar 2010 10:06:44 +0000 Subject: hg: jdk7/tl/jdk: 6934923: test/java/net/ipv6tests/TcpTest.java hangs on Solaris 10 Message-ID: <20100316100704.71B7644D83@hg.openjdk.java.net> Changeset: f88f6f8ddd21 Author: chegar Date: 2010-03-16 10:05 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/f88f6f8ddd21 6934923: test/java/net/ipv6tests/TcpTest.java hangs on Solaris 10 Reviewed-by: alanb ! test/java/net/ipv6tests/TcpTest.java ! test/java/net/ipv6tests/Tests.java From christopher.hegarty at sun.com Tue Mar 16 14:34:19 2010 From: christopher.hegarty at sun.com (christopher.hegarty at sun.com) Date: Tue, 16 Mar 2010 14:34:19 +0000 Subject: hg: jdk7/tl/jdk: 6935199: java/net regression tests failing with Assertions Message-ID: <20100316143439.1BD6244DC5@hg.openjdk.java.net> Changeset: 895a1211b2e1 Author: chegar Date: 2010-03-16 14:31 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/895a1211b2e1 6935199: java/net regression tests failing with Assertions Reviewed-by: michaelm ! test/ProblemList.txt ! test/java/net/CookieHandler/TestHttpCookie.java ! test/java/net/URLClassLoader/closetest/CloseTest.java From Xueming.Shen at Sun.COM Tue Mar 16 20:06:53 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Tue, 16 Mar 2010 12:06:53 -0800 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> Message-ID: <4B9FE4DD.1090405@sun.com> Martin Buchholz wrote: > Therefore the existing implementation >>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>> && codePoint<= MAX_CODE_POINT; >>> >>> will almost always perform just one comparison against a constant, >>> which is hard to beat. >>> >>> >> 1. Wondering: I think there are TWO comparisons. >> 2. Those comparisons need to load 32 bit values from machine code, against >> only 8 bit values in my case. >> > > It's a good point. In the machine code, shifts are likely to use > immediate values, and so will be a small win. > > int x = codePoint >>> 16; > return x != 0 && x < 0x11; > > (On modern hardware, these optimizations > are less valuable than they used to be; > ordinary integer arithmetic is almost free) > > I'm not convinced if the proposed code is really better...a "small win". Without seeing the real native machine code generated, I'm not sure if 0: iload_0 1: bipush 16 3: iushr 4: istore_1 5: iload_1 6: ifeq 19 is really better than 0: iload_0 1: ldc #2 // int 65536 3: if_icmplt 16 for bmp character case, especially given the existing code has better readability and yes, shorter.... Yes, shift might be able to use the immediate values, but it still needs to handle the "operands" and it is an extra operation. The only chance the new one might be better is that the "ifeq" is faster than "if_icmplt", but have not worked on the instruction set level for too long, so I can't tell (kinda remember you have to check the "circles" of each operation to see which one is "faster" during my old gcc compiler day) OK, convince me:-) -Sherman public class Character extends java.lang.Object { public static final int MIN_SUPPLEMENTARY_CODE_POINT = 65536; public static final int MAX_CODE_POINT = 1114111; public Character(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."":()V 4: return public static boolean isSupplementaryCodePoint(int); Code: 0: iload_0 1: ldc #2 // int 65536 3: if_icmplt 16 6: iload_0 7: ldc #3 // int 1114111 9: if_icmpgt 16 12: iconst_1 13: goto 17 16: iconst_0 17: ireturn public static boolean isSupplementaryCodePoint_new(int); Code: 0: iload_0 1: bipush 16 3: iushr 4: istore_1 5: iload_1 6: ifeq 19 9: iload_1 10: bipush 17 12: if_icmpge 19 15: iconst_1 16: goto 20 19: iconst_0 20: ireturn } From Ulf.Zibis at gmx.de Tue Mar 16 19:48:07 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 16 Mar 2010 20:48:07 +0100 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> Message-ID: <4B9FE077.3060608@gmx.de> Here my additions: Am 13.03.2010 00:04, schrieb Martin Buchholz: >> - Why don't you like using the new isBMPCodePoint() for >> isSupplementaryCodePoint() and toUpperCaseCharArray() ? >> > I now use it for the assert in toUpperCaseCharArray() > return !isBMPCodePoint() && isValidCodePoint(); resolves in same than current code. > >> - Same shift magic would enhance isISOControl(), isHighSurrogate(), >> isLowSurrogate(), in particular if latter occur consecutive. >> > isISOControl - yes, others - I am not convinced. > If virtually shifted by 8, HotSpot could use cheaper 1-byte compare on the high byte. Additionally, those methods are often used consecutively, so all 4 compares would benefit from. >> 8-bit shift + compare would allow HotSpot to compile to smart 1-byte >> immediate op-codes. >> In encodeBufferLoop() you could use putChar(), putInt() instead put(). >> Should perform better. >> > I'm not convinced. You would need to assemble bytes into an > int, and then break them apart into bytes on the other side? > Some time ago, I disassembled such code. I could see, that the int was copied directly to memory by one 32-bit move instruction. In case of using put(byte), I saw 4 8-bit move instructions. I not have dissassembled if a 3-byte value first would be collected in a 3-byte byte[] and then copied by put(byte[]). Maybe HotSpot could optimize here too. Try it out. 2 will see more than 1. Maybe I was in error. BTW: for the same optimization, I would like to have putInt() and putLong() in Charbuffer, ShortBuffer and for the latter in IntBuffer. -Ulf From Ulf.Zibis at gmx.de Tue Mar 16 20:10:08 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 16 Mar 2010 21:10:08 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4B9FE4DD.1090405@sun.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> Message-ID: <4B9FE5A0.7010409@gmx.de> Here you can see, how HotSpot could benefit from that bit twiddling: I've filed some bugs against HotSpot to optimize those cases: 6932837 - Better use unsigned jump if one of the range limits is 0 6933327 - Use shifted addressing modes instead of shift instuctions -Ulf Am 16.03.2010 21:06, schrieb Xueming Shen: > Martin Buchholz wrote: >> Therefore the existing implementation >>>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>> && codePoint<= MAX_CODE_POINT; >>>> >>>> will almost always perform just one comparison against a constant, >>>> which is hard to beat. >>>> >>> 1. Wondering: I think there are TWO comparisons. >>> 2. Those comparisons need to load 32 bit values from machine code, >>> against >>> only 8 bit values in my case. >> >> It's a good point. In the machine code, shifts are likely to use >> immediate values, and so will be a small win. >> >> int x = codePoint >>> 16; >> return x != 0 && x < 0x11; >> >> (On modern hardware, these optimizations >> are less valuable than they used to be; >> ordinary integer arithmetic is almost free) >> > > I'm not convinced if the proposed code is really better...a "small win". > > Without seeing the real native machine code generated, I'm not sure > if > > 0: iload_0 1: bipush 16 > 3: iushr 4: istore_1 5: iload_1 > 6: ifeq 19 > > is really better than > > 0: iload_0 1: ldc #2 // int 65536 > 3: if_icmplt 16 > > > for bmp character case, especially given the existing code has better > readability and yes, shorter.... > > Yes, shift might be able to use the immediate values, but it still > needs to handle the "operands" > and it is an extra operation. The only chance the new one might be > better is that the "ifeq" is > faster than "if_icmplt", but have not worked on the instruction set > level for too long, so I can't > tell (kinda remember you have to check the "circles" of each operation > to see which one is > "faster" during my old gcc compiler day) > > OK, convince me:-) > > -Sherman > > > public class Character extends java.lang.Object { > public static final int MIN_SUPPLEMENTARY_CODE_POINT = 65536; > > public static final int MAX_CODE_POINT = 1114111; > > public Character(); > Code: > 0: aload_0 1: invokespecial #1 // > Method java/lang/Object."":()V > 4: return > public static boolean isSupplementaryCodePoint(int); > Code: > 0: iload_0 1: ldc #2 // > int 65536 > 3: if_icmplt 16 > 6: iload_0 7: ldc #3 // > int 1114111 > 9: if_icmpgt 16 > 12: iconst_1 13: goto 17 > 16: iconst_0 17: ireturn > public static boolean isSupplementaryCodePoint_new(int); > Code: > 0: iload_0 1: bipush 16 > 3: iushr 4: istore_1 5: iload_1 > 6: ifeq 19 > 9: iload_1 10: bipush 17 > 12: if_icmpge 19 > 15: iconst_1 16: goto 20 > 19: iconst_0 20: ireturn } > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinrb at google.com Tue Mar 16 20:28:05 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 16 Mar 2010 13:28:05 -0700 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <4B9FE077.3060608@gmx.de> References: <4A95079A.8080803@gmx.de> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> <4B9FE077.3060608@gmx.de> Message-ID: <1ccfd1c11003161328i5334041fre25eb31d9fa53e9e@mail.gmail.com> On Tue, Mar 16, 2010 at 12:48, Ulf Zibis wrote: > Here my additions: > > Am 13.03.2010 00:04, schrieb Martin Buchholz: >>> >>> - Why don't you like using the new isBMPCodePoint() for >>> isSupplementaryCodePoint() and toUpperCaseCharArray() ? >>> >> >> I now use it for the assert in toUpperCaseCharArray() >> > > ? ?return !isBMPCodePoint() && isValidCodePoint(); > resolves in same than current code. Hmmmm...... Yes, you've convinced me! Done. >>> - Same shift magic would enhance isISOControl(), isHighSurrogate(), >>> isLowSurrogate(), in particular if latter occur consecutive. >>> >> >> isISOControl - yes, others - I am not convinced. >> > > If virtually shifted by 8, HotSpot could use cheaper 1-byte compare on the > high byte. > Additionally, those methods are often used consecutively, so all 4 compares > would benefit from. Sorry, I'm still not convinced for the surrogate testing methods. Almost all chars are less than MIN_SURROGATE, so you have to beat the already amazingly cheap x >= MIN_SURROGATE. Martin From Xueming.Shen at Sun.COM Tue Mar 16 21:30:47 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Tue, 16 Mar 2010 13:30:47 -0800 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4B9FE5A0.7010409@gmx.de> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <4B9FE5A0.7010409@gmx.de> Message-ID: <4B9FF887.6080402@sun.com> What did you mean "Hotspot could benefit from..." Are you saying? if ( 6932837 gets fixed ) { existing isSupplementaryCodePoint() impl is better } else if ( 6933327 gets fixed ) { the proposed is better } else { existing isSupplementaryCodePoint() impl might still be better } So we will only see any benefit if they "don't fix 6932837, but fix 6933327"? -Sherman Ulf Zibis wrote: > Here you can see, how HotSpot could benefit from that bit twiddling: > > I've filed some bugs against HotSpot to optimize those cases: > 6932837 - > Better use unsigned jump if one of the range limits is 0 > 6933327 - > Use shifted addressing modes instead of shift instuctions > > -Ulf > > > Am 16.03.2010 21:06, schrieb Xueming Shen: >> Martin Buchholz wrote: >>> Therefore the existing implementation >>>>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>>> && codePoint<= MAX_CODE_POINT; >>>>> >>>>> will almost always perform just one comparison against a constant, >>>>> which is hard to beat. >>>>> >>>>> >>>> 1. Wondering: I think there are TWO comparisons. >>>> 2. Those comparisons need to load 32 bit values from machine code, >>>> against >>>> only 8 bit values in my case. >>>> >>> >>> It's a good point. In the machine code, shifts are likely to use >>> immediate values, and so will be a small win. >>> >>> int x = codePoint >>> 16; >>> return x != 0 && x < 0x11; >>> >>> (On modern hardware, these optimizations >>> are less valuable than they used to be; >>> ordinary integer arithmetic is almost free) >>> >>> >> >> I'm not convinced if the proposed code is really better...a "small win". >> >> Without seeing the real native machine code generated, I'm not sure >> if >> >> 0: iload_0 1: bipush 16 >> 3: iushr 4: istore_1 5: iload_1 >> 6: ifeq 19 >> >> is really better than >> >> 0: iload_0 1: ldc #2 // int 65536 >> 3: if_icmplt 16 >> >> >> for bmp character case, especially given the existing code has better >> readability and yes, shorter.... >> >> Yes, shift might be able to use the immediate values, but it still >> needs to handle the "operands" >> and it is an extra operation. The only chance the new one might be >> better is that the "ifeq" is >> faster than "if_icmplt", but have not worked on the instruction set >> level for too long, so I can't >> tell (kinda remember you have to check the "circles" of each >> operation to see which one is >> "faster" during my old gcc compiler day) >> >> OK, convince me:-) >> >> -Sherman >> >> >> public class Character extends java.lang.Object { >> public static final int MIN_SUPPLEMENTARY_CODE_POINT = 65536; >> >> public static final int MAX_CODE_POINT = 1114111; >> >> public Character(); >> Code: >> 0: aload_0 1: invokespecial #1 // >> Method java/lang/Object."":()V >> 4: return >> public static boolean isSupplementaryCodePoint(int); >> Code: >> 0: iload_0 1: ldc #2 // >> int 65536 >> 3: if_icmplt 16 >> 6: iload_0 7: ldc #3 // >> int 1114111 >> 9: if_icmpgt 16 >> 12: iconst_1 13: goto 17 >> 16: iconst_0 17: ireturn >> public static boolean isSupplementaryCodePoint_new(int); >> Code: >> 0: iload_0 1: bipush 16 >> 3: iushr 4: istore_1 5: iload_1 >> 6: ifeq 19 >> 9: iload_1 10: bipush 17 >> 12: if_icmpge 19 >> 15: iconst_1 16: goto 20 >> 19: iconst_0 20: ireturn } >> >> From Ulf.Zibis at gmx.de Tue Mar 16 20:36:16 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 16 Mar 2010 21:36:16 +0100 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <1ccfd1c11003161328i5334041fre25eb31d9fa53e9e@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> <4B9FE077.3060608@gmx.de> <1ccfd1c11003161328i5334041fre25eb31d9fa53e9e@mail.gmail.com> Message-ID: <4B9FEBC0.8070200@gmx.de> Am 16.03.2010 21:28, schrieb Martin Buchholz: > On Tue, Mar 16, 2010 at 12:48, Ulf Zibis wrote: > > > Hmmmm...... > > Yes, you've convinced me! > Done. > THE meeting had it's success. ;-) > >>>> - Same shift magic would enhance isISOControl(), isHighSurrogate(), >>>> isLowSurrogate(), in particular if latter occur consecutive. >>>> >>>> >>> isISOControl - yes, others - I am not convinced. >>> >>> >> If virtually shifted by 8, HotSpot could use cheaper 1-byte compare on the >> high byte. >> Additionally, those methods are often used consecutively, so all 4 compares >> would benefit from. >> > Sorry, I'm still not convinced for the surrogate testing methods. > Almost all chars are less than MIN_SURROGATE, so you have to beat > the already amazingly cheap > x>= MIN_SURROGATE. > Good point, but ... ... what about : 6933327 - Use shifted addressing modes instead of shift instuctions and internal review ID of 1735166 -Ulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinrb at google.com Tue Mar 16 20:46:26 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 16 Mar 2010 13:46:26 -0700 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <4B9FE999.3050106@gmx.de> References: <4A95079A.8080803@gmx.de> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995F9C.3070705@sun.com> <1ccfd1c11003121529r22651bfcnfca6435311d707a6@mail.gmail.com> <4B9FE999.3050106@gmx.de> Message-ID: <1ccfd1c11003161346o3496bc39gfa40583abd6bb8c9@mail.gmail.com> On Tue, Mar 16, 2010 at 13:27, Ulf Zibis wrote: > Am 13.03.2010 00:29, schrieb Martin Buchholz: > > Won't you like to add: > ? ? *

Note: In combination with {@link #isBMPCodePoint(int)} this > ? ? * method should be in 2nd place to permit additional HotSpot compiler > ? ? * optimization. Example: > ? ? *

> ? ? * ? ? if (Character.isBMPCodePoint(codePoint))
> ? ? * ? ? ? ? ...;
> ? ? * ? ? else if (Character.isSupplementaryCodePoint(codePoint))
> ? ? * ? ? ? ? ...;
> ? ? * ? ? else
> ? ? * ? ? ? ? ...;
> ? ? * 
> ? ? * No. This kind of implementation-specific comment is not traditionally put in public javadoc (it's considered OK in private comments). Also, we should not inflict our dangerous micro-optimization disease on others. >> 6934265: Add public method Character.isBMPCodePoint >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint >> > > Additionally please move static final int SIZE = 16 to one of the first > lines of the code. > See: https://bugs.openjdk.java.net/attachment.cgi?id=178&action=diff No. Although I agree with you that SIZE would be better near the top of the class, I am not going to move it, at least not now. For consistency, the SIZE fields in related classes like Short should be moved as well. >> 6934270: Remove javac warnings from Character.java >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings >> 6934271: Better handling of longer utf-8 sequences >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/utf8-twiddling >> 6666666: Optimize bit-twiddling in Bits.java >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Bits.java >> > > Hm, I can't see any difference that would merit to see it as > micro-optimization. Am I blind? Bytecode is smaller. >> Now I need to go off to my micro-optimizers-anonymous meeting. >> > > Oh, you are coming to Cologne, Germany. Nice to meet you personally. Das letzte Mal war ich in K?ln zu Bewerbungsinterview. Leider nur Stellungen in der Versicherungsindustrie zu der Zeit. Martin From martinrb at google.com Tue Mar 16 20:50:55 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 16 Mar 2010 13:50:55 -0700 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <4B9FEBC0.8070200@gmx.de> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> <4B9FE077.3060608@gmx.de> <1ccfd1c11003161328i5334041fre25eb31d9fa53e9e@mail.gmail.com> <4B9FEBC0.8070200@gmx.de> Message-ID: <1ccfd1c11003161350n25d12225kfb7621dee1a1d415@mail.gmail.com> On Tue, Mar 16, 2010 at 13:36, Ulf Zibis wrote: > Am 16.03.2010 21:28, schrieb Martin Buchholz: > Sorry, I'm still not convinced for the surrogate testing methods. > Almost all chars are less than MIN_SURROGATE, so you have to beat > the already amazingly cheap > x >= MIN_SURROGATE. > > > Good point, but ... > ... what about : > 6933327 - Use shifted addressing modes instead of shift instuctions > and internal review ID of 1735166 Although I do worry about what hotspot will do with my bytecode (as you know), I mostly try to think more abstractly about the JIT and simply produce high-quality JIT-friendly bytecode. Your considerations in 6933327 seem valuable, but are targeted at only one runtime compiler, on only one machine architecture. Martin From Ulf.Zibis at gmx.de Tue Mar 16 20:58:33 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 16 Mar 2010 21:58:33 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4B9FF887.6080402@sun.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <4B9FE5A0.7010409@gmx.de> <4B9FF887.6080402@sun.com> Message-ID: <4B9FF0F9.4090606@gmx.de> Very descriptive visualization. :-) I mean: if ( 6932837 && (review ID of 1735166) gets fixed ) { existing isSupplementaryCodePoint() impl is BEST } else if ( 6933327 gets fixed ) { the proposed is better } else { proposed is still little better than existing isSupplementaryCodePoint() impl } Additionally, toUpperCaseCharArray(), codePointCountImpl(), String(int[], int, int) would profit from consecutive use of isBMPCodePoint + isSupplementaryCodePoint() or isHighSurrogate() + isLowSurrogate. > So we will only see any benefit if they "don't fix 6932837, but fix 6933327"? fix 6932837 wouldn't harm, but other code, using shift by 8 | 16 would benefit from. -Ulf Am 16.03.2010 22:30, schrieb Xueming Shen: > What did you mean "Hotspot could benefit from..." > > Are you saying? > > if ( 6932837 gets fixed ) { > existing isSupplementaryCodePoint() impl is better > } else if ( 6933327 gets fixed ) { > the proposed is better > } else { > existing isSupplementaryCodePoint() impl might still be better > } > > So we will only see any benefit if they "don't fix 6932837, but fix > 6933327"? > > -Sherman > > > Ulf Zibis wrote: >> Here you can see, how HotSpot could benefit from that bit twiddling: >> >> I've filed some bugs against HotSpot to optimize those cases: >> 6932837 >> - Better use unsigned jump if one of the range limits is 0 >> 6933327 >> - Use shifted addressing modes instead of shift instuctions >> >> -Ulf >> >> >> Am 16.03.2010 21:06, schrieb Xueming Shen: >>> Martin Buchholz wrote: >>>> Therefore the existing implementation >>>>>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>>>> && codePoint<= MAX_CODE_POINT; >>>>>> >>>>>> will almost always perform just one comparison against a constant, >>>>>> which is hard to beat. >>>>>> >>>>> 1. Wondering: I think there are TWO comparisons. >>>>> 2. Those comparisons need to load 32 bit values from machine code, >>>>> against >>>>> only 8 bit values in my case. >>>> >>>> It's a good point. In the machine code, shifts are likely to use >>>> immediate values, and so will be a small win. >>>> >>>> int x = codePoint >>> 16; >>>> return x != 0 && x < 0x11; >>>> >>>> (On modern hardware, these optimizations >>>> are less valuable than they used to be; >>>> ordinary integer arithmetic is almost free) >>>> >>> >>> I'm not convinced if the proposed code is really better...a "small >>> win". >>> >>> Without seeing the real native machine code generated, I'm not sure >>> if >>> >>> 0: iload_0 1: bipush 16 >>> 3: iushr 4: istore_1 5: iload_1 >>> 6: ifeq 19 >>> >>> is really better than >>> >>> 0: iload_0 1: ldc #2 // int >>> 65536 >>> 3: if_icmplt 16 >>> >>> >>> for bmp character case, especially given the existing code has >>> better readability and yes, shorter.... >>> >>> Yes, shift might be able to use the immediate values, but it still >>> needs to handle the "operands" >>> and it is an extra operation. The only chance the new one might be >>> better is that the "ifeq" is >>> faster than "if_icmplt", but have not worked on the instruction set >>> level for too long, so I can't >>> tell (kinda remember you have to check the "circles" of each >>> operation to see which one is >>> "faster" during my old gcc compiler day) >>> >>> OK, convince me:-) >>> >>> -Sherman >>> >>> >>> public class Character extends java.lang.Object { >>> public static final int MIN_SUPPLEMENTARY_CODE_POINT = 65536; >>> >>> public static final int MAX_CODE_POINT = 1114111; >>> >>> public Character(); >>> Code: >>> 0: aload_0 1: invokespecial #1 // >>> Method java/lang/Object."":()V >>> 4: return public static boolean >>> isSupplementaryCodePoint(int); >>> Code: >>> 0: iload_0 1: ldc #2 // >>> int 65536 >>> 3: if_icmplt 16 >>> 6: iload_0 7: ldc #3 // >>> int 1114111 >>> 9: if_icmpgt 16 >>> 12: iconst_1 13: goto 17 >>> 16: iconst_0 17: ireturn public static boolean >>> isSupplementaryCodePoint_new(int); >>> Code: >>> 0: iload_0 1: bipush 16 >>> 3: iushr 4: istore_1 5: iload_1 >>> 6: ifeq 19 >>> 9: iload_1 10: bipush 17 >>> 12: if_icmpge 19 >>> 15: iconst_1 16: goto 20 >>> 19: iconst_0 20: ireturn } >>> >>> > > From martinrb at google.com Tue Mar 16 21:09:13 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 16 Mar 2010 14:09:13 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4B9FE4DD.1090405@sun.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> Message-ID: <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> On Tue, Mar 16, 2010 at 13:06, Xueming Shen wrote: > Martin Buchholz wrote: >> >> Therefore the existing implementation >>>> >>>> ?return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>> ? ? ? ? ? ?&& ?codePoint<= MAX_CODE_POINT; >>>> >>>> will almost always perform just one comparison against a constant, >>>> which is hard to beat. >>>> >>>> >>> >>> 1. Wondering: I think there are TWO comparisons. >>> 2. Those comparisons need to load 32 bit values from machine code, >>> against >>> only 8 bit values in my case. >>> >> >> It's a good point. ?In the machine code, shifts are likely to use >> immediate values, and so will be a small win. >> >> int x = codePoint >>> 16; >> return x != 0 && x < 0x11; >> >> (On modern hardware, these optimizations >> are less valuable than they used to be; >> ordinary integer arithmetic is almost free) >> >> > > I'm not convinced if the proposed code is really better...a "small win". The primary theory here is that branches are expensive, and we are reducing them by one. > Without seeing the real native machine code generated, I'm not sure > if > > ? ? ?0: iload_0 ? ? ? 1: bipush ? ? ? ?16 > ? ? ?3: iushr ? ? ? ? ? ? ? 4: istore_1 ? ? ? ? ? ?5: iload_1 ? ? ? ? ? ? 6: > ifeq ? ? ? ? ?19 > > is really better than > > ? ? ?0: iload_0 ? ? ? 1: ldc ? ? ? ? ? #2 ? ? ? ? ? ? ? ? ?// int 65536 > ? ? ?3: if_icmplt ? ? 16 > > > for bmp character case, especially given the existing code has better > readability and yes, shorter.... The very latest version of the code is Ulf's readable and optimal (as long as it is inlined) public static boolean isSupplementaryCodePoint(int codePoint) { return !isBMPCodePoint(codePoint) && isValidCodePoint(codePoint); } Martin > Yes, shift might be able to use the immediate values, but it still needs to > handle the "operands" > and it is an extra operation. The only chance the new one might be better is > that the "ifeq" is > faster than "if_icmplt", but have not worked on the instruction set level > for too long, so I can't > tell (kinda remember you have to check the "circles" of each operation to > see which one is > "faster" during my old gcc compiler day) > > OK, convince me:-) > > -Sherman > > > public class Character extends java.lang.Object { > ?public static final int MIN_SUPPLEMENTARY_CODE_POINT = 65536; > > ?public static final int MAX_CODE_POINT = 1114111; > > ?public Character(); > ? Code: > ? ? ?0: aload_0 ? ? ? ? ? ? 1: invokespecial #1 ? ? ? ? ? ? ? ? ?// Method > java/lang/Object."":()V > ? ? ?4: return > ?public static boolean isSupplementaryCodePoint(int); > ? Code: > ? ? ?0: iload_0 ? ? ? ? ? ? 1: ldc ? ? ? ? ? #2 ? ? ? ? ? ? ? ? ?// int > 65536 > ? ? ?3: if_icmplt ? ? 16 > ? ? ?6: iload_0 ? ? ? ? ? ? 7: ldc ? ? ? ? ? #3 ? ? ? ? ? ? ? ? ?// int > 1114111 > ? ? ?9: if_icmpgt ? ? 16 > ? ? 12: iconst_1 ? ? ? ? ? 13: goto ? ? ? ? ?17 > ? ? 16: iconst_0 ? ? ? ? ? 17: ireturn > ?public static boolean isSupplementaryCodePoint_new(int); > ? Code: > ? ? ?0: iload_0 ? ? ? ? ? ? 1: bipush ? ? ? ?16 > ? ? ?3: iushr ? ? ? ? ? ? ? 4: istore_1 ? ? ? ? ? ?5: iload_1 ? ? ? ? ? ? 6: > ifeq ? ? ? ? ?19 > ? ? ?9: iload_1 ? ? ? ? ? ?10: bipush ? ? ? ?17 > ? ? 12: if_icmpge ? ? 19 > ? ? 15: iconst_1 ? ? ? ? ? 16: goto ? ? ? ? ?20 > ? ? 19: iconst_0 ? ? ? ? ? 20: ireturn ? ? ? } > > From Ulf.Zibis at gmx.de Tue Mar 16 20:27:05 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 16 Mar 2010 21:27:05 +0100 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <1ccfd1c11003121529r22651bfcnfca6435311d707a6@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995F9C.3070705@sun.com> <1ccfd1c11003121529r22651bfcnfca6435311d707a6@mail.gmail.com> Message-ID: <4B9FE999.3050106@gmx.de> Am 13.03.2010 00:29, schrieb Martin Buchholz: > OK, next round of review. > > I changed my UTF-8 changes to be behavior-preserving, > removing any hint of controversy, and renamed the patch > to "utf8-twiddling". > > I got Ulf in my head, and can't stop micro-optimizing. > I added a new micro-optimizing patch for Bits.java. > Please file a bug. > > 6934268: Better implementation of Character.isValidCodePoint and > isSupplementaryCodePoint() > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint > Won't you like to add: *

Note: In combination with {@link #isBMPCodePoint(int)} this * method should be in 2nd place to permit additional HotSpot compiler * optimization. Example: *

      *     if (Character.isBMPCodePoint(codePoint))
      *         ...;
      *     else if (Character.isSupplementaryCodePoint(codePoint))
      *         ...;
      *     else
      *         ...;
      * 
* > 6934265: Add public method Character.isBMPCodePoint > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint > Additionally please move static final int SIZE = 16 to one of the first lines of the code. See: https://bugs.openjdk.java.net/attachment.cgi?id=178&action=diff > 6934270: Remove javac warnings from Character.java > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings > 6934271: Better handling of longer utf-8 sequences > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/utf8-twiddling > 6666666: Optimize bit-twiddling in Bits.java > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Bits.java > Hm, I can't see any difference that would merit to see it as micro-optimization. Am I blind? > Now I need to go off to my micro-optimizers-anonymous meeting. > Oh, you are coming to Cologne, Germany. Nice to meet you personally. -Ulf From Xueming.Shen at Sun.COM Tue Mar 16 22:35:16 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Tue, 16 Mar 2010 14:35:16 -0800 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> Message-ID: <4BA007A4.2030907@sun.com> Martin Buchholz wrote: > On Tue, Mar 16, 2010 at 13:06, Xueming Shen wrote: > >> Martin Buchholz wrote: >> >>> Therefore the existing implementation >>> >>>>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>>> && codePoint<= MAX_CODE_POINT; >>>>> >>>>> will almost always perform just one comparison against a constant, >>>>> which is hard to beat. >>>>> >>>>> >>>>> >>>> 1. Wondering: I think there are TWO comparisons. >>>> 2. Those comparisons need to load 32 bit values from machine code, >>>> against >>>> only 8 bit values in my case. >>>> >>>> >>> It's a good point. In the machine code, shifts are likely to use >>> immediate values, and so will be a small win. >>> >>> int x = codePoint >>> 16; >>> return x != 0 && x < 0x11; >>> >>> (On modern hardware, these optimizations >>> are less valuable than they used to be; >>> ordinary integer arithmetic is almost free) >>> >>> >>> >> I'm not convinced if the proposed code is really better...a "small win". >> > > The primary theory here is that branches are expensive, > and we are reducing them by one. > > There are still two branches in new impl, if you count the "ifeq" and "if_icmpge"(?) We are trying to "optimize" this piece of code with the assumption that the new impl MIGHT help certain vm (hotspot?) to optimize certain use scenario (some consecutive usages), if the compiler and/or the vm are both smart enough at certain point, with no supporting benchmark data? My concern is that the reality might be that this optimization might even hurt the BMP use case (the majority of the possible real world use scenarios) with a 10% bigger bytecode size. -Sherman public class Character extends java.lang.Object { public static final int MIN_SUPPLEMENTARY_CODE_POINT = 65536; public static final int MAX_CODE_POINT = 1114111; public Character(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."":()V 4: return public static boolean isSupplementaryCodePoint(int); Code: 0: iload_0 1: ldc #2 // int 65536 3: if_icmplt 16 6: iload_0 7: ldc #3 // int 1114111 9: if_icmpgt 16 12: iconst_1 13: goto 17 16: iconst_0 17: ireturn public static boolean isSupplementaryCodePoint_new(int); Code: 0: iload_0 1: bipush 16 3: iushr 4: istore_1 5: iload_1 6: ifeq 19 9: iload_1 10: bipush 17 12: if_icmpge 19 15: iconst_1 16: goto 20 19: iconst_0 20: ireturn } From martinrb at google.com Tue Mar 16 21:36:18 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 16 Mar 2010 14:36:18 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4B9FF0F9.4090606@gmx.de> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <4B9FE5A0.7010409@gmx.de> <4B9FF887.6080402@sun.com> <4B9FF0F9.4090606@gmx.de> Message-ID: <1ccfd1c11003161436o191295f2r9464d715488cb16d@mail.gmail.com> On Tue, Mar 16, 2010 at 13:58, Ulf Zibis wrote: > Additionally, toUpperCaseCharArray(), codePointCountImpl(), String(int[], > int, int) would profit from consecutive use of isBMPCodePoint + > isSupplementaryCodePoint() or isHighSurrogate() + isLowSurrogate. For codePointCountImpl(), I do not agree. For String(int[], int, int), I do agree. Here is my latest more readable and more performant implementation: int end = offset + count; // Pass 1: Compute precise size of char[] int n = 0; for (int i = offset; i < end; i++) { int c = codePoints[i]; if (Character.isBMPCodePoint(c)) n += 1; else if (Character.isSupplementaryCodePoint(c)) n += 2; else throw new IllegalArgumentException(Integer.toString(c)); } // Pass 2: Allocate and fill in char[] char[] v = new char[n]; for (int i = offset, j = 0; i < end; i++) { int c = codePoints[i]; if (Character.isBMPCodePoint(c)) { v[j++] = (char) c; } else { Character.toSurrogates(c, v, j); j += 2; } } Martin From Ulf.Zibis at gmx.de Tue Mar 16 21:51:55 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 16 Mar 2010 22:51:55 +0100 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <1ccfd1c11003161357p50ab32delb260dd8f16651915@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> <4B9FE077.3060608@gmx.de> <1ccfd1c11003161357p50ab32delb260dd8f16651915@mail.gmail.com> Message-ID: <4B9FFD7B.2070705@gmx.de> Am 16.03.2010 21:57, schrieb Martin Buchholz: > On Tue, Mar 16, 2010 at 12:48, Ulf Zibis wrote: > >>>> - Same shift magic would enhance isISOControl(), isHighSurrogate(), >>>> isLowSurrogate(), in particular if latter occur consecutive. >>>> >>>> >>> isISOControl - yes, others - I am not convinced. >>> >>> >> If virtually shifted by 8, HotSpot could use cheaper 1-byte compare on the >> high byte. >> Additionally, those methods are often used consecutively, so all 4 compares >> would benefit from. >> >> >>>> 8-bit shift + compare would allow HotSpot to compile to smart 1-byte >>>> immediate op-codes. >>>> In encodeBufferLoop() you could use putChar(), putInt() instead put(). >>>> Should perform better. >>>> >>>> >>> I'm not convinced. You would need to assemble bytes into an >>> int, and then break them apart into bytes on the other side? >>> >>> >> Some time ago, I disassembled such code. I could see, that the int was >> copied directly to memory by one 32-bit move instruction. >> In case of using put(byte), I saw 4 8-bit move instructions. >> > Ulf, I'd like to understand this better. > > How are you generating the machine code > (pointer to docs?)? > I must prepare it. Takes some time. > Bits.java is doing byte-oriented put instructions in any case. > If the VM can optimize putInt, it should be able to optimize > the equivalent series of put(byte) as well, no? > Yes, it should, but it doesn't. > Can you provide a small patch that gives an observable > performance improvement in a micro-benchmark? > I'll try. > >> I not have dissassembled if a 3-byte value first would be collected in a >> 3-byte byte[] and then copied by put(byte[]). Maybe HotSpot could optimize >> here too. >> >> Try it out. 2 will see more than 1. Maybe I was in error. >> >> BTW: for the same optimization, I would like to have putInt() and putLong() >> in Charbuffer, ShortBuffer and for the latter in IntBuffer. >> > Perhaps better to get the VM to optimize a series of puts into > a single instruction? > I have such an RFE in mind. -Ulf From martinrb at google.com Tue Mar 16 22:00:34 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 16 Mar 2010 15:00:34 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA007A4.2030907@sun.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> Message-ID: <1ccfd1c11003161500o3d41e3felb2ab619f27095082@mail.gmail.com> I am recanting my previous support for any change to isSupplementaryCodePoint. I think my brain (or maybe Ulf's brain) tricked me into thinking that the considerations for isValidCodePoint and isBMPCodePoint also apply to isSupplementaryCodePoint. Sorry. I renamed my patch file from isSupplementaryCodePoint to isValidCodePoint. 6934268: Better implementation of Character.isValidCodePoint http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isValidCodePoint 6934265: Add public method Character.isBMPCodePoint http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint 6934270: Remove javac warnings from Character.java http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings 6934271: Better handling of longer utf-8 sequences http://cr.openjdk.java.net/~martin/webrevs/openjdk7/utf8-twiddling 6935172: Optimize bit-twiddling in Bits.java http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Bits.java Martin On Tue, Mar 16, 2010 at 15:35, Xueming Shen wrote: > Martin Buchholz wrote: >> >> On Tue, Mar 16, 2010 at 13:06, Xueming Shen wrote: >> >>> >>> Martin Buchholz wrote: >>> >>>> >>>> Therefore the existing implementation >>>> >>>>>> >>>>>> ?return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>>>> ? ? ? ? ? && ?codePoint<= MAX_CODE_POINT; >>>>>> >>>>>> will almost always perform just one comparison against a constant, >>>>>> which is hard to beat. >>>>>> >>>>>> >>>>>> >>>>> >>>>> 1. Wondering: I think there are TWO comparisons. >>>>> 2. Those comparisons need to load 32 bit values from machine code, >>>>> against >>>>> only 8 bit values in my case. >>>>> >>>>> >>>> >>>> It's a good point. ?In the machine code, shifts are likely to use >>>> immediate values, and so will be a small win. >>>> >>>> int x = codePoint >>> 16; >>>> return x != 0 && x < 0x11; >>>> >>>> (On modern hardware, these optimizations >>>> are less valuable than they used to be; >>>> ordinary integer arithmetic is almost free) >>>> >>>> >>>> >>> >>> I'm not convinced if the proposed code is really better...a "small win". >>> >> >> The primary theory here is that branches are expensive, >> and we are reducing them by one. >> >> > > There are still two branches in new impl, if you count the "ifeq" and > "if_icmpge"(?) > > We are trying to "optimize" this piece of code with the assumption that the > new impl MIGHT help certain vm (hotspot?) > to optimize certain use scenario (some consecutive usages), if the compiler > and/or the vm are both smart enough at certain > point, with no supporting benchmark data? > > My concern is that the reality might be that this optimization might even > hurt the BMP use > case (the majority of the possible real world use scenarios) with a 10% > bigger bytecode size. > > -Sherman > > > > public class Character extends java.lang.Object { > ?public static final int MIN_SUPPLEMENTARY_CODE_POINT = 65536; > > ?public static final int MAX_CODE_POINT = 1114111; > > ?public Character(); > ? Code: > ? ? ?0: aload_0 ? ? ? ? ? ? 1: invokespecial #1 ? ? ? ? ? ? ? ? ?// Method > java/lang/Object."":()V > ? ? ?4: return > ?public static boolean isSupplementaryCodePoint(int); > ? Code: > ? ? ?0: iload_0 ? ? ? ? ? ? 1: ldc ? ? ? ? ? #2 ? ? ? ? ? ? ? ? ?// int > 65536 > ? ? ?3: if_icmplt ? ? 16 > ? ? ?6: iload_0 ? ? ? ? ? ? 7: ldc ? ? ? ? ? #3 ? ? ? ? ? ? ? ? ?// int > 1114111 > ? ? ?9: if_icmpgt ? ? 16 > ? ? 12: iconst_1 ? ? ? ? ? 13: goto ? ? ? ? ?17 > ? ? 16: iconst_0 ? ? ? ? ? 17: ireturn > ?public static boolean isSupplementaryCodePoint_new(int); > ? Code: > ? ? ?0: iload_0 ? ? ? ? ? ? 1: bipush ? ? ? ?16 > ? ? ?3: iushr ? ? ? ? ? ? ? 4: istore_1 > ? ? ?5: iload_1 ? ? ? ? ? ? 6: ifeq ? ? ? ? ?19 > ? ? ?9: iload_1 ? ? ? ? ? ?10: bipush ? ? ? ?17 > ? ? 12: if_icmpge ? ? 19 > ? ? 15: iconst_1 ? ? ? ? ? 16: goto ? ? ? ? ?20 > ? ? 19: iconst_0 ? ? ? ? ? 20: ireturn ? ? ? } > > > > > > From Ulf.Zibis at gmx.de Tue Mar 16 22:25:40 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 16 Mar 2010 23:25:40 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA007A4.2030907@sun.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> Message-ID: <4BA00564.4010104@gmx.de> Am 16.03.2010 23:35, schrieb Xueming Shen: > Martin Buchholz wrote: >> On Tue, Mar 16, 2010 at 13:06, Xueming Shen >> wrote: >>> Martin Buchholz wrote: >>>> Therefore the existing implementation >>>>>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>>>> && codePoint<= MAX_CODE_POINT; >>>>>> >>>>>> will almost always perform just one comparison against a constant, >>>>>> which is hard to beat. >>>>>> >>>>>> >>>>> 1. Wondering: I think there are TWO comparisons. >>>>> 2. Those comparisons need to load 32 bit values from machine code, >>>>> against >>>>> only 8 bit values in my case. >>>>> >>>> It's a good point. In the machine code, shifts are likely to use >>>> immediate values, and so will be a small win. >>>> >>>> int x = codePoint >>> 16; >>>> return x != 0 && x < 0x11; >>>> >>>> (On modern hardware, these optimizations >>>> are less valuable than they used to be; >>>> ordinary integer arithmetic is almost free) >>>> >>>> >>> I'm not convinced if the proposed code is really better...a "small >>> win". >> >> The primary theory here is that branches are expensive, >> and we are reducing them by one. >> > > There are still two branches in new impl, if you count the "ifeq" and > "if_icmpge"(?) True. But for (int i = offset; i < offset + count; i++) { int c = codePoints[i]; byte plane = (byte)(c >>> 16); if (plane == 0) n += 1; else if (plane >= 0 && plane <= (byte)0x11) n += 2; else throw new IllegalArgumentException(Integer.toString(c)); } has too only 2 branches if 6932837 would be fixed, 3 otherwise, and additionally could benefit from tiny 8-bit comparisons. The shift additionally could be omitted on CPU's which can benefit from 6933327. Instead: for (int i = offset; i < offset + count; i++) { int c = codePoints[i]; if (c >= Character.MIN_VALUE && c <= Character.MAX_VALUE) n += 1; else if (c >= Character.MIN_SUPPLEMENTARY_CODE_POINT && c <= Character.MAX_SUPPLEMENTARY_CODE_POINT) n += 2; else throw new IllegalArgumentException(Integer.toString(c)); } needs 4 branches and 4 32-bit comparisons. > > We are trying to "optimize" this piece of code with the assumption > that the new impl MIGHT help certain vm (hotspot?) > to optimize certain use scenario (some consecutive usages), if the > compiler and/or the vm are both smart enough at certain > point, with no supporting benchmark data? > > My concern is that the reality might be that this optimization might > even hurt the BMP use > case (the majority of the possible real world use scenarios) with a > 10% bigger bytecode size. > > -Sherman > > > > public class Character extends java.lang.Object { > public static final int MIN_SUPPLEMENTARY_CODE_POINT = 65536; > > public static final int MAX_CODE_POINT = 1114111; > > public Character(); > Code: > 0: aload_0 1: invokespecial #1 // > Method java/lang/Object."":()V > 4: return > public static boolean isSupplementaryCodePoint(int); > Code: > 0: iload_0 1: ldc #2 // > int 65536 > 3: if_icmplt 16 > 6: iload_0 7: ldc #3 // > int 1114111 > 9: if_icmpgt 16 > 12: iconst_1 13: goto 17 > 16: iconst_0 17: ireturn > public static boolean isSupplementaryCodePoint_new(int); > Code: > 0: iload_0 1: bipush 16 > 3: iushr 4: istore_1 > 5: iload_1 6: ifeq 19 > 9: iload_1 10: bipush 17 > 12: if_icmpge 19 > 15: iconst_1 16: goto 20 > 19: iconst_0 20: ireturn } > > > > > > From Ulf.Zibis at gmx.de Tue Mar 16 23:14:08 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 17 Mar 2010 00:14:08 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003161436o191295f2r9464d715488cb16d@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <4B9FE5A0.7010409@gmx.de> <4B9FF887.6080402@sun.com> <4B9FF0F9.4090606@gmx.de> <1ccfd1c11003161436o191295f2r9464d715488cb16d@mail.gmail.com> Message-ID: <4BA010C0.6020204@gmx.de> Am 16.03.2010 22:36, schrieb Martin Buchholz: > On Tue, Mar 16, 2010 at 13:58, Ulf Zibis wrote: > > >> Additionally, toUpperCaseCharArray(), codePointCountImpl(), String(int[], >> int, int) would profit from consecutive use of isBMPCodePoint + >> isSupplementaryCodePoint() or isHighSurrogate() + isLowSurrogate. >> > For codePointCountImpl(), I do not agree. > 1-byte comparisons have less footprint, in doubt load faster from memory, need less L1-CPU-cache, on small/RISC/etc. CPU's would be faster and therefore should enhance overall performance. The shift additionally could be omitted on CPU's which can benefit from 6933327. > For String(int[], int, int), I do agree. > > Here is my latest more readable and more performant implementation: > > int end = offset + count; > > // Pass 1: Compute precise size of char[] > int n = 0; > for (int i = offset; i< end; i++) { > int c = codePoints[i]; > if (Character.isBMPCodePoint(c)) > n += 1; > else if (Character.isSupplementaryCodePoint(c)) > n += 2; > else throw new IllegalArgumentException(Integer.toString(c)); > } > > // Pass 2: Allocate and fill in char[] > char[] v = new char[n]; > for (int i = offset, j = 0; i< end; i++) { > int c = codePoints[i]; > if (Character.isBMPCodePoint(c)) { > v[j++] = (char) c; > } else { > Character.toSurrogates(c, v, j); > j += 2; > } > } > I suggest: // Pass 2: Allocate and fill in char[] char[] v = new char[n]; for (int i = end; n > 0; ) { int c = codePoints[--i]; if (Character.isBMPCodePoint(c)) v[--n] = (char)c; else Character.toSurrogates(c, v, n -= 2); } - saves 1 variable (=reduces register pressure) - determining of the loop end against 0 is faster than against "end", see: 6932855 BTW: int end = offset + count; could be saved, as VM would do that, for sure in HotSpot c2 compiler. -Ulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinrb at google.com Tue Mar 16 23:41:08 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 16 Mar 2010 16:41:08 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA010C0.6020204@gmx.de> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <4B9FE5A0.7010409@gmx.de> <4B9FF887.6080402@sun.com> <4B9FF0F9.4090606@gmx.de> <1ccfd1c11003161436o191295f2r9464d715488cb16d@mail.gmail.com> <4BA010C0.6020204@gmx.de> Message-ID: <1ccfd1c11003161641o202711aerad4e686d21a2cf53@mail.gmail.com> On Tue, Mar 16, 2010 at 16:14, Ulf Zibis wrote: > Am 16.03.2010 22:36, schrieb Martin Buchholz: > > On Tue, Mar 16, 2010 at 13:58, Ulf Zibis wrote: > > > > Additionally, toUpperCaseCharArray(), codePointCountImpl(), String(int[], > int, int) would profit from consecutive use of isBMPCodePoint + > isSupplementaryCodePoint() or isHighSurrogate() + isLowSurrogate. > > > For codePointCountImpl(), I do not agree. > > > 1-byte comparisons have less footprint, in doubt load faster from memory, > need less L1-CPU-cache, on small/RISC/etc. CPU's would be faster and > therefore should enhance overall performance. > The shift additionally could be omitted on CPU's which can benefit from > 6933327. I am not convinced. Using byte for local variables is unlikely to give any performance benefit. The only way use of byte can be a win is if you read/write a bunch of them at once from memory. I think of byte as a compression scheme for int. > For String(int[], int, int), I do agree. > > Here is my latest more readable and more performant implementation: > > int end = offset + count; > > // Pass 1: Compute precise size of char[] > int n = 0; > for (int i = offset; i < end; i++) { > int c = codePoints[i]; > if (Character.isBMPCodePoint(c)) > n += 1; > else if (Character.isSupplementaryCodePoint(c)) > n += 2; > else throw new IllegalArgumentException(Integer.toString(c)); > } > > // Pass 2: Allocate and fill in char[] > char[] v = new char[n]; > for (int i = offset, j = 0; i < end; i++) { > int c = codePoints[i]; > if (Character.isBMPCodePoint(c)) { > v[j++] = (char) c; > } else { > Character.toSurrogates(c, v, j); > j += 2; > } > } > > > I suggest: > > ??????? // Pass 2: Allocate and fill in char[] > ??????? char[] v = new char[n]; > ??????? for (int i = end; n > 0; ) { > ??????????? int c = codePoints[--i]; > ??????????? if (Character.isBMPCodePoint(c)) > ??????????????? v[--n] = (char)c; > ??????????? else > ??????????????? Character.toSurrogates(c, v, n -= 2); > ??????? } > > - saves 1 variable (=reduces register pressure) > - determining of the loop end against 0 is faster than against "end", see: > 6932855 Perhaps, but this exceeds my micro-optimization threshold. > BTW: > ??? int end = offset + count; > could be saved, as VM would do that, for sure in HotSpot c2 compiler. > > -Ulf > > Martin From Ulf.Zibis at gmx.de Wed Mar 17 00:46:54 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 17 Mar 2010 01:46:54 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003161641o202711aerad4e686d21a2cf53@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <4B9FE5A0.7010409@gmx.de> <4B9FF887.6080402@sun.com> <4B9FF0F9.4090606@gmx.de> <1ccfd1c11003161436o191295f2r9464d715488cb16d@mail.gmail.com> <4BA010C0.6020204@gmx.de> <1ccfd1c11003161641o202711aerad4e686d21a2cf53@mail.gmail.com> Message-ID: <4BA0267E.7080009@gmx.de> Am 17.03.2010 00:41, schrieb Martin Buchholz: > On Tue, Mar 16, 2010 at 16:14, Ulf Zibis wrote: > >> Am 16.03.2010 22:36, schrieb Martin Buchholz: >> >> On Tue, Mar 16, 2010 at 13:58, Ulf Zibis wrote: >> >> >> >> Additionally, toUpperCaseCharArray(), codePointCountImpl(), String(int[], >> int, int) would profit from consecutive use of isBMPCodePoint + >> isSupplementaryCodePoint() or isHighSurrogate() + isLowSurrogate. >> >> >> For codePointCountImpl(), I do not agree. >> >> >> 1-byte comparisons have less footprint, in doubt load faster from memory, >> need less L1-CPU-cache, on small/RISC/etc. CPU's would be faster and >> therefore should enhance overall performance. >> The shift additionally could be omitted on CPU's which can benefit from >> 6933327. >> 1) I agree, this is academical. 2) should better be optimized by VM, but isn't at this time see: Just filed, no ID yet: - Transform comparisons against odd border to even border (Review ID: 1735166) - Use as less bits as necessary 3) didn't you say, we should write code without referring on VM vendor specific optimizations 4) Regardless the 8-bit/32-bit arguments, if we subtract 0xd800/0xdc00, I guess, we could benefit from 6932837 - Better use unsigned jump if one of the range limits is 0 for (int i = offset; i < endIndex; ) { n++; byte highByte = (byte)((a[i++] >>> 8) - 0xd8); if (highByte >= 0 && highByte < 0x4) { if (i < endIndex && (highByte = (byte)((a[i] >>> 8) - 0xdc)) >= 0 && highByte < 0x4) { i++; } } } > I am not convinced. Using byte for local variables is unlikely to > give any performance benefit. The only way use of byte can be > a win is if you read/write a bunch of them at once from memory. > I think of byte as a compression scheme for int. > > >> For String(int[], int, int), I do agree. >> >> Here is my latest more readable and more performant implementation: >> >> int end = offset + count; >> >> // Pass 1: Compute precise size of char[] >> int n = 0; >> for (int i = offset; i< end; i++) { >> int c = codePoints[i]; >> if (Character.isBMPCodePoint(c)) >> n += 1; >> else if (Character.isSupplementaryCodePoint(c)) >> n += 2; >> else throw new IllegalArgumentException(Integer.toString(c)); >> } >> >> // Pass 2: Allocate and fill in char[] >> char[] v = new char[n]; >> for (int i = offset, j = 0; i< end; i++) { >> int c = codePoints[i]; >> if (Character.isBMPCodePoint(c)) { >> v[j++] = (char) c; >> } else { >> Character.toSurrogates(c, v, j); >> j += 2; >> } >> } >> >> >> I suggest: >> >> // Pass 2: Allocate and fill in char[] >> char[] v = new char[n]; >> for (int i = end; n> 0; ) { >> int c = codePoints[--i]; >> if (Character.isBMPCodePoint(c)) >> v[--n] = (char)c; >> else >> Character.toSurrogates(c, v, n -= 2); >> } >> >> - saves 1 variable (=reduces register pressure) >> - determining of the loop end against 0 is faster than against "end", see: >> 6932855 >> > Perhaps, but this exceeds my micro-optimization threshold. > :-( -Ulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ulf.Zibis at gmx.de Wed Mar 17 01:14:36 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 17 Mar 2010 02:14:36 +0100 Subject: request for paired constants in j.l.Character Message-ID: <4BA02CFC.8010907@gmx.de> In java.lang.Character we have: public static final char MIN_VALUE = '\u0000'; public static final char MAX_VALUE = '\uFFFF'; public static final int MIN_CODE_POINT = 0x000000; public static final int MAX_CODE_POINT = 0X10FFFF; public static final int MIN_SUPPLEMENTARY_CODE_POINT = MAX_VALUE + 1; As we have MIN_CODE_POINT, which is duplicate of MIN_VALUE, IMO we additionally could have public static final int MAX_SUPPLEMENTARY_CODE_POINT = MAX_CODE_POINT; It would look better and serve plenty users expectations to find those MIN/MAX constants as pair. Is there anybody who agrees with me ? -Ulf From Ulf.Zibis at gmx.de Wed Mar 17 01:20:35 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 17 Mar 2010 02:20:35 +0100 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <1ccfd1c11003161346o3496bc39gfa40583abd6bb8c9@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995F9C.3070705@sun.com> <1ccfd1c11003121529r22651bfcnfca6435311d707a6@mail.gmail.com> <4B9FE999.3050106@gmx.de> <1ccfd1c11003161346o3496bc39gfa40583abd6bb8c9@mail.gmail.com> Message-ID: <4BA02E63.306@gmx.de> Am 16.03.2010 21:46, schrieb Martin Buchholz: > On Tue, Mar 16, 2010 at 13:27, Ulf Zibis wrote: > >> Am 13.03.2010 00:29, schrieb Martin Buchholz: >> >> Won't you like to add: >> *

Note: In combination with {@link #isBMPCodePoint(int)} this >> * method should be in 2nd place to permit additional HotSpot compiler >> * optimization. Example: >> *

>>      *     if (Character.isBMPCodePoint(codePoint))
>>      *         ...;
>>      *     else if (Character.isSupplementaryCodePoint(codePoint))
>>      *         ...;
>>      *     else
>>      *         ...;
>>      *
>> * >> > No. > > This kind of implementation-specific comment is not > traditionally put in public javadoc (it's considered OK > in private comments). Also, we should not inflict our > dangerous micro-optimization disease on others. > Hm, I believe I've seen similar things in javadoc, but I can agree dropping the part. Would "In combination with {@link #isBMPCodePoint(int)} this method should be in 2nd place to permit additional VM optimization." be better? -Ulf From weijun.wang at sun.com Wed Mar 17 01:55:37 2010 From: weijun.wang at sun.com (weijun.wang at sun.com) Date: Wed, 17 Mar 2010 01:55:37 +0000 Subject: hg: jdk7/tl/jdk: 6868865: Test: sun/security/tools/jarsigner/oldsig.sh fails under all platforms Message-ID: <20100317015556.BC14844E6E@hg.openjdk.java.net> Changeset: 0500f7306cbe Author: weijun Date: 2010-03-17 09:55 +0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/0500f7306cbe 6868865: Test: sun/security/tools/jarsigner/oldsig.sh fails under all platforms Reviewed-by: wetmore ! test/sun/security/tools/jarsigner/oldsig.sh From Ulf.Zibis at gmx.de Wed Mar 17 08:36:08 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 17 Mar 2010 09:36:08 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA00564.4010104@gmx.de> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA00564.4010104@gmx.de> Message-ID: <4BA09478.90809@gmx.de> Oops, correction: But for (int i = offset; i < offset + count; i++) { int c = codePoints[i]; byte plane = (byte)(c >>> 16); if (plane == 0) n += 1; else if (plane <= (byte)0x11) n += 2; else throw new IllegalArgumentException(Integer.toString(c)); } has too only 2 branches and additionally could benefit from tiny 8-bit comparisons. The shift additionally could be omitted on CPU's which can benefit from 6933327. Instead: for (int i = offset; i < offset + count; i++) { int c = codePoints[i]; if (c >= Character.MIN_VALUE && c <= Character.MAX_VALUE) n += 1; else if (c >= Character.MIN_SUPPLEMENTARY_CODE_POINT && c <= Character.MAX_SUPPLEMENTARY_CODE_POINT) n += 2; else throw new IllegalArgumentException(Integer.toString(c)); } needs 4 branches and 4 32-bit comparisons. -Ulf From Ulf.Zibis at gmx.de Wed Mar 17 09:11:56 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 17 Mar 2010 10:11:56 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA09478.90809@gmx.de> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA00564.4010104@gmx.de> <4BA09478.90809@gmx.de> Message-ID: <4BA09CDC.1040402@gmx.de> Am I mad ??? 2nd. correction: But for (int i = offset; i < offset + count; i++) { int c = codePoints[i]; char plane = (char)(c >>> 16); if (plane == 0) n += 1; else if (plane < 0x11) n += 2; else throw new IllegalArgumentException(Integer.toString(c)); } has too only 2 branches and additionally could benefit from tiny 16-bit comparisons. The shift additionally could be omitted on CPU's which can benefit from 6933327. Instead: for (int i = offset; i < offset + count; i++) { int c = codePoints[i]; if (c >= Character.MIN_VALUE && c <= Character.MAX_VALUE) n += 1; else if (c >= Character.MIN_SUPPLEMENTARY_CODE_POINT && c <= Character.MAX_SUPPLEMENTARY_CODE_POINT) n += 2; else throw new IllegalArgumentException(Integer.toString(c)); } needs 4 branches and 4 32-bit comparisons. -Ulf From martinrb at google.com Wed Mar 17 15:46:42 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 17 Mar 2010 07:46:42 -0800 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA09CDC.1040402@gmx.de> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA00564.4010104@gmx.de> <4BA09478.90809@gmx.de> <4BA09CDC.1040402@gmx.de> Message-ID: <1ccfd1c11003170846n19c3e273v71daff3a755c58a4@mail.gmail.com> On Wed, Mar 17, 2010 at 01:11, Ulf Zibis wrote: > Am I mad ??? > > 2nd. correction: > > But > ? ? ? ?for (int i = offset; i < offset + count; i++) { > ? ? ? ? ? ?int c = codePoints[i]; > ? ? ? ? ? ?char plane = (char)(c >>> 16); > ? ? ? ? ? ?if (plane == 0) > ? ? ? ? ? ? ? ?n += 1; > ? ? ? ? ? ?else if (plane < 0x11) > ? ? ? ? ? ? ? ?n += 2; > ? ? ? ? ? ?else throw new IllegalArgumentException(Integer.toString(c)); > ? ? ? ?} > has too only 2 branches and additionally could benefit from tiny 16-bit > comparisons. > The shift additionally could be omitted on CPU's which can benefit from > 6933327. I'm not a x86 or hotspot expert, but I would think that the "plane" variable is never written to memory, but lives only in a register, so I see only drawbacks to making plane a "char". Martin From Xueming.Shen at Sun.COM Wed Mar 17 17:05:48 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Wed, 17 Mar 2010 09:05:48 -0800 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003170846n19c3e273v71daff3a755c58a4@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA00564.4010104@gmx.de> <4BA09478.90809@gmx.de> <4BA09CDC.1040402@gmx.de> <1ccfd1c11003170846n19c3e273v71daff3a755c58a4@mail.gmail.com> Message-ID: <4BA10BEC.8050105@sun.com> Martin Buchholz wrote: > On Wed, Mar 17, 2010 at 01:11, Ulf Zibis wrote: > >> Am I mad ??? >> >> 2nd. correction: >> >> But >> for (int i = offset; i < offset + count; i++) { >> int c = codePoints[i]; >> char plane = (char)(c >>> 16); >> if (plane == 0) >> n += 1; >> else if (plane < 0x11) >> n += 2; >> else throw new IllegalArgumentException(Integer.toString(c)); >> } >> has too only 2 branches and additionally could benefit from tiny 16-bit >> comparisons. >> The shift additionally could be omitted on CPU's which can benefit from >> 6933327. >> > > I'm not a x86 or hotspot expert, but I would think that the "plane" > variable is never written to memory, but lives only in a register, > so I see only drawbacks to making plane a "char". > > I doubt there is any benefit to use a 8-bit or 16-bit operand on a 32-bit/64-bit machine. While optimization is definitely good, but it might not be a good habit to code in high-level program language while thinking in assembly every each minute:-) let's leave those optimization to hotspot engineer:-) In this particular case, given most application will never use supplementary character, I doubt it really worth the optimization and I would definitely not try to change the impl of isSupplementaryCP to make this code "better". If you really really want to optimize this code the alternative is to have a package private Character.getPlane(), or simply to use your optimized code above. I would suggest to use int for plane, instead of char or byte. -Sherman From Ulf.Zibis at gmx.de Wed Mar 17 16:29:27 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 17 Mar 2010 17:29:27 +0100 Subject: 2 Questions on StringBuffer Message-ID: <4BA10367.5060307@gmx.de> Why there are 2 methods which do not use the super method, where I can't see any difference? : public synchronized char charAt(int index) public synchronized void setCharAt(int index, char ch) Wouldn't ensureCapacity better coded as follows? : public void ensureCapacity(int minimumCapacity) { if (minimumCapacity > value.length) synchronized { ensureCapacity(minimumCapacity); } } This would save the synchronization if there is nothing to do. -Ulf From forax at univ-mlv.fr Wed Mar 17 16:36:41 2010 From: forax at univ-mlv.fr (=?UTF-8?B?UsOpbWkgRm9yYXg=?=) Date: Wed, 17 Mar 2010 17:36:41 +0100 Subject: 2 Questions on StringBuffer In-Reply-To: <4BA10367.5060307@gmx.de> References: <4BA10367.5060307@gmx.de> Message-ID: <4BA10519.2090509@univ-mlv.fr> Le 17/03/2010 17:29, Ulf Zibis a ?crit : > Why there are 2 methods which do not use the super method, where I > can't see any difference? : > > public synchronized char charAt(int index) > public synchronized void setCharAt(int index, char ch) > > Wouldn't ensureCapacity better coded as follows? : > public void ensureCapacity(int minimumCapacity) { > if (minimumCapacity > value.length) synchronized { > ensureCapacity(minimumCapacity); > } > } > This would save the synchronization if there is nothing to do. > > -Ulf > > > no, it doesn't work. if some variables are not in the synchronized block, they can be updated by one thread but this change will be not visible in another thread. R?mi From Ulf.Zibis at gmx.de Wed Mar 17 17:01:08 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 17 Mar 2010 18:01:08 +0100 Subject: 2 Questions on StringBuffer In-Reply-To: <4BA10519.2090509@univ-mlv.fr> References: <4BA10367.5060307@gmx.de> <4BA10519.2090509@univ-mlv.fr> Message-ID: <4BA10AD4.9010602@gmx.de> Am 17.03.2010 17:36, schrieb R?mi Forax: > Le 17/03/2010 17:29, Ulf Zibis a ?crit : >> Why there are 2 methods which do not use the super method, where I >> can't see any difference? : >> >> public synchronized char charAt(int index) >> public synchronized void setCharAt(int index, char ch) >> >> Wouldn't ensureCapacity better coded as follows? : >> public void ensureCapacity(int minimumCapacity) { >> if (minimumCapacity > value.length) synchronized { >> ensureCapacity(minimumCapacity); >> } >> } >> This would save the synchronization if there is nothing to do. >> >> -Ulf >> >> >> > > no, it doesn't work. > if some variables are not in the synchronized block, > they can be updated by one thread but this change will be not visible > in another thread. Hm, those values are checked again in the super.ensureCapacity(), so inside the synchronized block. I guess this is the answer on my 2nd question, thanks. Please excuse little typo, I meant: super.ensureCapacity(minimumCapacity); -Ulf From martinrb at google.com Wed Mar 17 17:41:12 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 17 Mar 2010 09:41:12 -0800 Subject: 2 Questions on StringBuffer In-Reply-To: <4BA10367.5060307@gmx.de> References: <4BA10367.5060307@gmx.de> Message-ID: <1ccfd1c11003171041y5ed161efhd4d4b39716e7e2db@mail.gmail.com> On Wed, Mar 17, 2010 at 08:29, Ulf Zibis wrote: > Why there are 2 methods which do not use the super method, where I can't see > any difference? : > > ? ?public synchronized char charAt(int index) > ? ?public synchronized void setCharAt(int index, char ch) You're correct that these methods could be refactored to call super ("DRY"), but the code duplication is small, and these methods are performance-critical, so let's just leave them as is. Martin From Ulf.Zibis at gmx.de Wed Mar 17 18:02:13 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 17 Mar 2010 19:02:13 +0100 Subject: 2 Questions on StringBuffer In-Reply-To: <1ccfd1c11003171041y5ed161efhd4d4b39716e7e2db@mail.gmail.com> References: <4BA10367.5060307@gmx.de> <1ccfd1c11003171041y5ed161efhd4d4b39716e7e2db@mail.gmail.com> Message-ID: <4BA11925.7080702@gmx.de> Am 17.03.2010 18:41, schrieb Martin Buchholz: > On Wed, Mar 17, 2010 at 08:29, Ulf Zibis wrote: > >> Why there are 2 methods which do not use the super method, where I can't see >> any difference? : >> >> public synchronized char charAt(int index) >> public synchronized void setCharAt(int index, char ch) >> > You're correct that these methods > could be refactored to call super ("DRY"), > but the code duplication is small, > and these methods are performance-critical, > so let's just leave them as is. > Additionally I think, there's a bug in javadoc of those methods. Actually they throw StringIndexOutOfBoundsException. -Ulf From martinrb at google.com Wed Mar 17 19:12:41 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 17 Mar 2010 11:12:41 -0800 Subject: 2 Questions on StringBuffer In-Reply-To: <4BA11925.7080702@gmx.de> References: <4BA10367.5060307@gmx.de> <1ccfd1c11003171041y5ed161efhd4d4b39716e7e2db@mail.gmail.com> <4BA11925.7080702@gmx.de> Message-ID: <1ccfd1c11003171212r199a0d7bn49afe7f570f767ee@mail.gmail.com> On Wed, Mar 17, 2010 at 10:02, Ulf Zibis wrote: > Am 17.03.2010 18:41, schrieb Martin Buchholz: >> >> On Wed, Mar 17, 2010 at 08:29, Ulf Zibis ?wrote: >> >>> >>> Why there are 2 methods which do not use the super method, where I can't >>> see >>> any difference? : >>> >>> ? ?public synchronized char charAt(int index) >>> ? ?public synchronized void setCharAt(int index, char ch) >>> >> >> You're correct that these methods >> could be refactored to call super ("DRY"), >> but the code duplication is small, >> and these methods are performance-critical, >> so let's just leave them as is. >> > > Additionally I think, there's a bug in javadoc of those methods. > Actually they throw StringIndexOutOfBoundsException. Why would that be a bug? Martin From martinrb at google.com Wed Mar 17 21:16:36 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 17 Mar 2010 13:16:36 -0800 Subject: request for paired constants in j.l.Character In-Reply-To: <4BA02CFC.8010907@gmx.de> References: <4BA02CFC.8010907@gmx.de> Message-ID: <1ccfd1c11003171416q32a2820bh3cfb1f0fc93923f0@mail.gmail.com> On Tue, Mar 16, 2010 at 17:14, Ulf Zibis wrote: > In java.lang.Character we have: > ? ?public static final char MIN_VALUE = '\u0000'; > ? ?public static final char MAX_VALUE = '\uFFFF'; > ? ?public static final int MIN_CODE_POINT = 0x000000; > ? ?public static final int MAX_CODE_POINT = 0X10FFFF; > ? ?public static final int MIN_SUPPLEMENTARY_CODE_POINT = MAX_VALUE + 1; > > As we have MIN_CODE_POINT, which is duplicate of MIN_VALUE, IMO we > additionally could have > ? ?public static final int MAX_SUPPLEMENTARY_CODE_POINT = MAX_CODE_POINT; > > It would look better and serve plenty users expectations to find those > MIN/MAX constants as pair. > > Is there anybody who agrees with me ? I agree that the symmetry of MIN/MAX pairs is a good thing to maintain, and so your suggestion is slightly better than the status quo, but ... IMO not better enough to actually justify making any change to the Java Platform API. Martin From Ulf.Zibis at gmx.de Wed Mar 17 21:24:57 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 17 Mar 2010 22:24:57 +0100 Subject: 2 Questions on StringBuffer In-Reply-To: <1ccfd1c11003171212r199a0d7bn49afe7f570f767ee@mail.gmail.com> References: <4BA10367.5060307@gmx.de> <1ccfd1c11003171041y5ed161efhd4d4b39716e7e2db@mail.gmail.com> <4BA11925.7080702@gmx.de> <1ccfd1c11003171212r199a0d7bn49afe7f570f767ee@mail.gmail.com> Message-ID: <4BA148A9.7090605@gmx.de> Am 17.03.2010 20:12, schrieb Martin Buchholz: > On Wed, Mar 17, 2010 at 10:02, Ulf Zibis wrote: > >> >> Additionally I think, there's a bug in javadoc of those methods. >> Actually they throw StringIndexOutOfBoundsException. >> > Why would that be a bug? > I think, javadoc should indicate StringIndexOutOfBoundsException here: /** * @throws IndexOutOfBoundsException {@inheritDoc} * @see #length() */ public synchronized char charAt(int index) { if ((index < 0) || (index >= count)) throw new StringIndexOutOfBoundsException(index); return value[index]; } Or am I not enough informed about {@inheritDoc} ? -Ulf From martinrb at google.com Wed Mar 17 22:00:31 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 17 Mar 2010 15:00:31 -0700 Subject: 2 Questions on StringBuffer In-Reply-To: <4BA148A9.7090605@gmx.de> References: <4BA10367.5060307@gmx.de> <1ccfd1c11003171041y5ed161efhd4d4b39716e7e2db@mail.gmail.com> <4BA11925.7080702@gmx.de> <1ccfd1c11003171212r199a0d7bn49afe7f570f767ee@mail.gmail.com> <4BA148A9.7090605@gmx.de> Message-ID: <1ccfd1c11003171500x7423cf7cv941a6d87361357c5@mail.gmail.com> On Wed, Mar 17, 2010 at 14:24, Ulf Zibis wrote: > Am 17.03.2010 20:12, schrieb Martin Buchholz: >> >> On Wed, Mar 17, 2010 at 10:02, Ulf Zibis ?wrote: >> >>> >>> Additionally I think, there's a bug in javadoc of those methods. >>> Actually they throw StringIndexOutOfBoundsException. >>> >> >> Why would that be a bug? >> > > I think, javadoc should indicate StringIndexOutOfBoundsException here: That would be an incompatible tightening of the spec. To understand this, you need to think abstractly about the specification and implementation of the Java Platform as two completely separate things. Martin > ? ?/** > ? ? * @throws IndexOutOfBoundsException {@inheritDoc} > ? ? * @see ? ? ? ?#length() > ? ? */ > ? ?public synchronized char charAt(int index) { > ? ? ? ?if ((index < 0) || (index >= count)) > ? ? ? ? ? ?throw new StringIndexOutOfBoundsException(index); > ? ? ? ?return value[index]; > ? ?} > > > Or am I not enough informed about {@inheritDoc} ? > > -Ulf > > > > From i30817 at gmail.com Wed Mar 17 23:59:36 2010 From: i30817 at gmail.com (Paulo Levi) Date: Wed, 17 Mar 2010 23:59:36 +0000 Subject: Do Set implementations waste memory? Message-ID: <212322091003171659o7f57afddg21493451887c5b3e@mail.gmail.com> My understanding is that set implementations are implemented by using Maps internally + a marker object, and that since Maps are implemented using arrays of entries this is at least n*3 references more that what is needed, since there are never multiple values. Any plans to change this? I suspect it would be a boon for programs that use the correct data structure. -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Thu Mar 18 01:16:50 2010 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Thu, 18 Mar 2010 02:16:50 +0100 Subject: Do Set implementations waste memory? In-Reply-To: <212322091003171659o7f57afddg21493451887c5b3e@mail.gmail.com> References: <212322091003171659o7f57afddg21493451887c5b3e@mail.gmail.com> Message-ID: <4BA17F02.7050906@univ-mlv.fr> Le 18/03/2010 00:59, Paulo Levi a ?crit : > My understanding is that set implementations are implemented by using > Maps internally + a marker object, and that since Maps are implemented > using arrays of entries this is at least n*3 references more that what > is needed, since there are never multiple values. > > Any plans to change this? I suspect it would be a boon for programs > that use the correct data structure. > You have to test it. My guess is that there will be no difference. As far as I remember, an object needs to be aligned on a valid 64bits address even in 32bits mode, Hotspot uses a 64bits header and the internal hash map entry contains 4 ints, if you remove the reference corresponding to the value, the empty place will be considered as garbage and not used. Else, you can try to remove the internal entry object but in that case the hashcode of the element will be not stored anymore and you will have a slowdown for all objects that doesn't cache their hashcode by itself. R?mi From jim.andreou at gmail.com Thu Mar 18 02:07:10 2010 From: jim.andreou at gmail.com (Dimitris Andreou) Date: Thu, 18 Mar 2010 04:07:10 +0200 Subject: Do Set implementations waste memory? In-Reply-To: <4BA17F02.7050906@univ-mlv.fr> References: <212322091003171659o7f57afddg21493451887c5b3e@mail.gmail.com> <4BA17F02.7050906@univ-mlv.fr> Message-ID: <7d7138c11003171907x4038968bwc5ba27e1661c9988@mail.gmail.com> 2010/3/18 R?mi Forax > Le 18/03/2010 00:59, Paulo Levi a ?crit : > > My understanding is that set implementations are implemented by using Maps >> internally + a marker object, and that since Maps are implemented using >> arrays of entries this is at least n*3 references more that what is needed, >> since there are never multiple values. >> >> Any plans to change this? I suspect it would be a boon for programs that >> use the correct data structure. >> >> > You have to test it. > My guess is that there will be no difference. > As far as I remember, an object needs to be aligned on a valid 64bits > address even in 32bits mode, > Hotspot uses a 64bits header and the internal hash map entry contains 4 > ints, > if you remove the reference corresponding to the value, the empty place > will be > considered as garbage and not used. > > Else, you can try to remove the internal entry object but in that case > the hashcode of the element will be not stored anymore and you will > have a slowdown for all objects that doesn't cache their hashcode by > itself. > > R?mi > > See my second-to-last post in this thread: http://groups.google.com/group/guava-discuss/browse_thread/thread/23bc8fa5ae479698 In short, I tested removing the "value" field of a HashMap's entry object, and indeed (through Instrumentation#getObjectSize) I observed no reduction in memory. I had to remove one further field (e.g. "hash") to make a reduction (of 8 bytes per entry). Dimitris -------------- next part -------------- An HTML attachment was scrubbed... URL: From weijun.wang at sun.com Thu Mar 18 10:27:35 2010 From: weijun.wang at sun.com (weijun.wang at sun.com) Date: Thu, 18 Mar 2010 10:27:35 +0000 Subject: hg: jdk7/tl/jdk: 6829283: HTTP/Negotiate: Autheticator triggered again when user cancels the first one Message-ID: <20100318102854.BFEB644088@hg.openjdk.java.net> Changeset: 2796f839e337 Author: weijun Date: 2010-03-18 18:26 +0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/2796f839e337 6829283: HTTP/Negotiate: Autheticator triggered again when user cancels the first one Reviewed-by: chegar ! src/share/classes/sun/net/www/protocol/http/spnego/NegotiateCallbackHandler.java ! test/sun/security/krb5/auto/HttpNegotiateServer.java From opinali at gmail.com Thu Mar 18 13:22:47 2010 From: opinali at gmail.com (Osvaldo Doederlein) Date: Thu, 18 Mar 2010 10:22:47 -0300 Subject: Do Set implementations waste memory? In-Reply-To: <7d7138c11003171907x4038968bwc5ba27e1661c9988@mail.gmail.com> References: <212322091003171659o7f57afddg21493451887c5b3e@mail.gmail.com> <4BA17F02.7050906@univ-mlv.fr> <7d7138c11003171907x4038968bwc5ba27e1661c9988@mail.gmail.com> Message-ID: Hi, I've tread the google-groups thread, it seems you didn't test on a 64-bit VM. Could you do that test, with and without CompressedOops, and using latest HotSpot (7b85 or 6u20ea)? I guess we should see advantages in both memory savings and speed, at least with CompressedOops. It is too easy to dismiss an optimization on the basis of "doesn't deliver benefit on a particular VM". It may be good on a different implementation, or a different architecture like 32 vs. 64 bits. Object headers, field layouts, alignments etc., are not portable, and the best rule of thumb is that any removed field _will_ reduce memory usage at least in some implementation. The oldest collection classes were designed for the needs of J2SE 1.2, a full decade ago. This was discussed before, IIRC there was some reply from Josh agreeing that some speeed-vs-size tradeoffs made last decade should be revisited today. The extra runtime size/bloat that a specialized HashSet implementation would cost, was reasonably significant in 1999 but completely irrelevant in 2010. I mean, HashSet is a HUGELY important collection, and the benefit of any optimization of its implementation would spread to many APIs and applications. And the problem is not only the extra value field, there is also overhead from the extra indirection (plus extra polymorphic call) from the HashSet object to the internal HashMap object. This overhead may sometimes be sufficient to block inlining and devirtualization, so it's a potentially bigger cost than just a single extra memory load (which is easily hoisted out of loops etc.). Look at this code inside HashSet for a much worse cost: public Iterator iterator() { return map.keySet().iterator(); } Yeah we pay the cost of building the internal HashMap's key-set (which is lazily-built), just to iterate the freaking HashSet. (Notice that differently from HashMap, a Set is a true Collection that we can iterate directly without any view-collection of keys/values/entries.) IMHO all this adds evidence that the current HashSet implementation is a significant performance bug. We need a brand-new impl that does the hashing internally, without relying on HashMap, without any unused fields, extra indirections, or surprising costs like that for iterator(). I guess it would be relatively simple to copy-paste HashMap's code, cut stuff until just a Set of keys is left, and merge in the most specific pieces of HashSet (basically just readObject()/writeObject()). A+ Osvaldo 2010/3/17 Dimitris Andreou > > 2010/3/18 R?mi Forax > > Le 18/03/2010 00:59, Paulo Levi a ?crit : >> >> My understanding is that set implementations are implemented by using >>> Maps internally + a marker object, and that since Maps are implemented using >>> arrays of entries this is at least n*3 references more that what is needed, >>> since there are never multiple values. >>> >>> Any plans to change this? I suspect it would be a boon for programs that >>> use the correct data structure. >>> >> >> You have to test it. My guess is that there will be no difference. >> As far as I remember, an object needs to be aligned on a valid 64bits >> address even in 32bits mode, >> Hotspot uses a 64bits header and the internal hash map entry contains 4 >> ints, >> if you remove the reference corresponding to the value, the empty place >> will be >> considered as garbage and not used. >> > > Else, you can try to remove the internal entry object but in that case >> the hashcode of the element will be not stored anymore and you will >> have a slowdown for all objects that doesn't cache their hashcode by >> itself. >> >> > See my second-to-last post in this thread: > > http://groups.google.com/group/guava-discuss/browse_thread/thread/23bc8fa5ae479698 > > In short, I tested removing the "value" field of a HashMap's entry object, > and indeed (through Instrumentation#getObjectSize) I observed no reduction > in memory. I had to remove one further field (e.g. "hash") to make a > reduction (of 8 bytes per entry). > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ulf.Zibis at gmx.de Thu Mar 18 13:54:17 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 18 Mar 2010 14:54:17 +0100 Subject: Do Set implementations waste memory? In-Reply-To: References: <212322091003171659o7f57afddg21493451887c5b3e@mail.gmail.com> <4BA17F02.7050906@univ-mlv.fr> <7d7138c11003171907x4038968bwc5ba27e1661c9988@mail.gmail.com> Message-ID: <4BA23089.6070803@gmx.de> +1 -Ulf Am 18.03.2010 14:22, schrieb Osvaldo Doederlein: > > The oldest collection classes were designed for the needs of J2SE 1.2, > a full decade ago. This was discussed before, IIRC there was some > reply from Josh agreeing that some speeed-vs-size tradeoffs made last > decade should be revisited today. The extra runtime size/bloat that a > specialized HashSet implementation would cost, was reasonably > significant in 1999 but completely irrelevant in 2010. I mean, HashSet > is a HUGELY important collection, and the benefit of any optimization > of its implementation would spread to many APIs and applications. > > And the problem is not only the extra value field, there is also > overhead from the extra indirection (plus extra polymorphic call) from > the HashSet object to the internal HashMap object. This overhead may > sometimes be sufficient to block inlining and devirtualization, so > it's a potentially bigger cost than just a single extra memory load > (which is easily hoisted out of loops etc.). Look at this code inside > HashSet for a much worse cost: > > public Iterator iterator() { > return map.keySet().iterator(); > } > > Yeah we pay the cost of building the internal HashMap's key-set (which > is lazily-built), just to iterate the freaking HashSet. (Notice that > differently from HashMap, a Set is a true Collection that we can > iterate directly without any view-collection of keys/values/entries.) > > IMHO all this adds evidence that the current HashSet implementation is > a significant performance bug. We need a brand-new impl that does the > hashing internally, without relying on HashMap, without any unused > fields, extra indirections, or surprising costs like that for > iterator(). I guess it would be relatively simple to copy-paste > HashMap's code, cut stuff until just a Set of keys is left, and merge > in the most specific pieces of HashSet (basically just > readObject()/writeObject()). > From Ulf.Zibis at gmx.de Thu Mar 18 14:09:20 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 18 Mar 2010 15:09:20 +0100 Subject: Do Set implementations waste memory? In-Reply-To: <4BA23089.6070803@gmx.de> References: <212322091003171659o7f57afddg21493451887c5b3e@mail.gmail.com> <4BA17F02.7050906@univ-mlv.fr> <7d7138c11003171907x4038968bwc5ba27e1661c9988@mail.gmail.com> <4BA23089.6070803@gmx.de> Message-ID: <4BA23410.1000204@gmx.de> ... and please consider Bug 6812862 - provide customizable hash() algorithm in HashMap for speed tuning again and too for HashSet. -Ulf Am 18.03.2010 14:54, schrieb Ulf Zibis: > +1 > > -Ulf > > Am 18.03.2010 14:22, schrieb Osvaldo Doederlein: >> >> The oldest collection classes were designed for the needs of J2SE >> 1.2, a full decade ago. This was discussed before, IIRC there was >> some reply from Josh agreeing that some speeed-vs-size tradeoffs made >> last decade should be revisited today. The extra runtime size/bloat >> that a specialized HashSet implementation would cost, was reasonably >> significant in 1999 but completely irrelevant in 2010. I mean, >> HashSet is a HUGELY important collection, and the benefit of any >> optimization of its implementation would spread to many APIs and >> applications. >> >> And the problem is not only the extra value field, there is also >> overhead from the extra indirection (plus extra polymorphic call) >> from the HashSet object to the internal HashMap object. This overhead >> may sometimes be sufficient to block inlining and devirtualization, >> so it's a potentially bigger cost than just a single extra memory >> load (which is easily hoisted out of loops etc.). Look at this code >> inside HashSet for a much worse cost: >> >> public Iterator iterator() { >> return map.keySet().iterator(); >> } >> >> Yeah we pay the cost of building the internal HashMap's key-set >> (which is lazily-built), just to iterate the freaking HashSet. >> (Notice that differently from HashMap, a Set is a true Collection >> that we can iterate directly without any view-collection of >> keys/values/entries.) >> >> IMHO all this adds evidence that the current HashSet implementation >> is a significant performance bug. We need a brand-new impl that does >> the hashing internally, without relying on HashMap, without any >> unused fields, extra indirections, or surprising costs like that for >> iterator(). I guess it would be relatively simple to copy-paste >> HashMap's code, cut stuff until just a Set of keys is left, and merge >> in the most specific pieces of HashSet (basically just >> readObject()/writeObject()). >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim.andreou at gmail.com Thu Mar 18 14:10:01 2010 From: jim.andreou at gmail.com (Dimitris Andreou) Date: Thu, 18 Mar 2010 14:10:01 +0000 Subject: Do Set implementations waste memory? In-Reply-To: References: <212322091003171659o7f57afddg21493451887c5b3e@mail.gmail.com> <4BA17F02.7050906@univ-mlv.fr> <7d7138c11003171907x4038968bwc5ba27e1661c9988@mail.gmail.com> Message-ID: <7d7138c11003180710t35f1d065oac3ffebabf7d271e@mail.gmail.com> 2010/3/18 Osvaldo Doederlein > Hi, > > I've tread the google-groups thread, it seems you didn't test on a 64-bit > VM. Could you do that test, with and without CompressedOops, and using > latest HotSpot (7b85 or 6u20ea)? I guess we should see advantages in both > memory savings and speed, at least with CompressedOops. > > It is too easy to dismiss an optimization on the basis of "doesn't deliver > benefit on a particular VM". It may be good on a different implementation, > or a different architecture like 32 vs. 64 bits. Object headers, field > layouts, alignments etc., are not portable, and the best rule of thumb is > that any removed field _will_ reduce memory usage at least in some > implementation. > > The oldest collection classes were designed for the needs of J2SE 1.2, a > full decade ago. This was discussed before, IIRC there was some reply from > Josh agreeing that some speeed-vs-size tradeoffs made last decade should be > revisited today. The extra runtime size/bloat that a specialized HashSet > implementation would cost, was reasonably significant in 1999 but completely > irrelevant in 2010. I mean, HashSet is a HUGELY important collection, and > the benefit of any optimization of its implementation would spread to many > APIs and applications. > > And the problem is not only the extra value field, there is also overhead > from the extra indirection (plus extra polymorphic call) from the HashSet > object to the internal HashMap object. This overhead may sometimes be > sufficient to block inlining and devirtualization, so it's a potentially > bigger cost than just a single extra memory load (which is easily hoisted > out of loops etc.). Look at this code inside HashSet for a much worse cost: > > public Iterator iterator() { > return map.keySet().iterator(); > } > > Yeah we pay the cost of building the internal HashMap's key-set (which is > lazily-built), just to iterate the freaking HashSet. (Notice that > differently from HashMap, a Set is a true Collection that we can iterate > directly without any view-collection of keys/values/entries.) > > IMHO all this adds evidence that the current HashSet implementation is a > significant performance bug. We need a brand-new impl that does the hashing > internally, without relying on HashMap, without any unused fields, extra > indirections, or surprising costs like that for iterator(). I guess it would > be relatively simple to copy-paste HashMap's code, cut stuff until just a > Set of keys is left, and merge in the most specific pieces of HashSet > (basically just readObject()/writeObject()). > Hi, Sorry, I was disappointed by the result and sent the code to /dev/null, so can't readily test that, but yes, it is a relatively simple exercise. In my opinion, if someone is going to undertake the task of creating a new HashSet, he'd better start from a white page, not going the "HashMap-->snip-->snip-->HashSet" path. Even if for some platforms there would be some gains through this path, not reducing memory footprint on a large number of 32-bit platforms would be quite a pity. (About runtime performance, given the amount of "magic" in the JVM, I dare to say even less). I find what Martin suggests on that thread (his second-to-last post) a quite promising alternative (open addressing plus two parallel arrays, for keys and for hashes). Just my 2c. I would love to know in more detail Doug's opinion on the matter. Dimitris > > A+ > Osvaldo > > 2010/3/17 Dimitris Andreou > >> >> 2010/3/18 R?mi Forax >> >> Le 18/03/2010 00:59, Paulo Levi a ?crit : >>> >>> My understanding is that set implementations are implemented by using >>>> Maps internally + a marker object, and that since Maps are implemented using >>>> arrays of entries this is at least n*3 references more that what is needed, >>>> since there are never multiple values. >>>> >>>> Any plans to change this? I suspect it would be a boon for programs that >>>> use the correct data structure. >>>> >>> >>> You have to test it. My guess is that there will be no difference. >>> As far as I remember, an object needs to be aligned on a valid 64bits >>> address even in 32bits mode, >>> Hotspot uses a 64bits header and the internal hash map entry contains 4 >>> ints, >>> if you remove the reference corresponding to the value, the empty place >>> will be >>> considered as garbage and not used. >>> >> >> Else, you can try to remove the internal entry object but in that case >>> the hashcode of the element will be not stored anymore and you will >>> have a slowdown for all objects that doesn't cache their hashcode by >>> itself. >>> >>> >> See my second-to-last post in this thread: >> >> http://groups.google.com/group/guava-discuss/browse_thread/thread/23bc8fa5ae479698 >> >> In short, I tested removing the "value" field of a HashMap's entry object, >> and indeed (through Instrumentation#getObjectSize) I observed no reduction >> in memory. I had to remove one further field (e.g. "hash") to make a >> reduction (of 8 bytes per entry). >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu-ching.peng at sun.com Fri Mar 19 01:08:20 2010 From: yu-ching.peng at sun.com (yu-ching.peng at sun.com) Date: Fri, 19 Mar 2010 01:08:20 +0000 Subject: hg: jdk7/tl/jdk: 3 new changesets Message-ID: <20100319010953.BA4894417C@hg.openjdk.java.net> Changeset: c52f292a8f86 Author: valeriep Date: 2010-03-18 17:05 -0700 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/c52f292a8f86 6695485: SignedObject constructor throws ProviderException if it's called using provider "SunPKCS11-Solaris" Summary: Added checking for RSA key lengths in initSign and initVerify Reviewed-by: vinnie ! src/share/classes/sun/security/pkcs11/P11Signature.java + test/sun/security/pkcs11/Signature/TestRSAKeyLength.java Changeset: df5714cbe76d Author: valeriep Date: 2010-03-18 17:32 -0700 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/df5714cbe76d 6591117: Poor preformance of PKCS#11 security provider compared to Sun default provider Summary: Added internal buffering to PKCS11 SecureRandom impl Reviewed-by: wetmore ! src/share/classes/sun/security/pkcs11/P11SecureRandom.java Changeset: dc42c9d9ca16 Author: valeriep Date: 2010-03-18 17:56 -0700 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/dc42c9d9ca16 6837847: PKCS#11 A SecureRandom and a serialization error following installation of 1.5.0_18 Summary: Added a custom readObject method to PKCS11 SecureRandom impl Reviewed-by: wetmore ! src/share/classes/sun/security/pkcs11/P11SecureRandom.java + test/sun/security/pkcs11/SecureRandom/TestDeserialization.java From lana.steuck at sun.com Fri Mar 19 05:17:04 2010 From: lana.steuck at sun.com (lana.steuck at sun.com) Date: Fri, 19 Mar 2010 05:17:04 +0000 Subject: hg: jdk7/tl: 3 new changesets Message-ID: <20100319051704.78292441C6@hg.openjdk.java.net> Changeset: 3ddf90b39176 Author: mikejwre Date: 2010-03-04 13:50 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/rev/3ddf90b39176 Added tag jdk7-b85 for changeset cf26288a114b ! .hgtags Changeset: 433a60a9c0bf Author: lana Date: 2010-03-09 15:28 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/rev/433a60a9c0bf Merge Changeset: 98505d97a822 Author: lana Date: 2010-03-18 18:50 -0700 URL: http://hg.openjdk.java.net/jdk7/tl/rev/98505d97a822 Merge From lana.steuck at sun.com Fri Mar 19 05:17:11 2010 From: lana.steuck at sun.com (lana.steuck at sun.com) Date: Fri, 19 Mar 2010 05:17:11 +0000 Subject: hg: jdk7/tl/corba: Added tag jdk7-b85 for changeset c67a9df7bc0c Message-ID: <20100319051713.B7AF2441C7@hg.openjdk.java.net> Changeset: 6253e28826d1 Author: mikejwre Date: 2010-03-04 13:50 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/corba/rev/6253e28826d1 Added tag jdk7-b85 for changeset c67a9df7bc0c ! .hgtags From lana.steuck at sun.com Fri Mar 19 05:19:23 2010 From: lana.steuck at sun.com (lana.steuck at sun.com) Date: Fri, 19 Mar 2010 05:19:23 +0000 Subject: hg: jdk7/tl/hotspot: 2 new changesets Message-ID: <20100319051931.87330441C8@hg.openjdk.java.net> Changeset: 418bc80ce139 Author: mikejwre Date: 2010-03-04 13:50 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/418bc80ce139 Added tag jdk7-b85 for changeset 6c9796468b91 ! .hgtags Changeset: bf823ef06b4f Author: trims Date: 2010-03-08 15:50 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/hotspot/rev/bf823ef06b4f Added tag hs17-b10 for changeset 418bc80ce139 ! .hgtags From lana.steuck at sun.com Fri Mar 19 05:23:25 2010 From: lana.steuck at sun.com (lana.steuck at sun.com) Date: Fri, 19 Mar 2010 05:23:25 +0000 Subject: hg: jdk7/tl/jaxp: Added tag jdk7-b85 for changeset 6c0ccabb430d Message-ID: <20100319052325.DADBE441CA@hg.openjdk.java.net> Changeset: 81c0f115bbe5 Author: mikejwre Date: 2010-03-04 13:50 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jaxp/rev/81c0f115bbe5 Added tag jdk7-b85 for changeset 6c0ccabb430d ! .hgtags From lana.steuck at sun.com Fri Mar 19 05:23:32 2010 From: lana.steuck at sun.com (lana.steuck at sun.com) Date: Fri, 19 Mar 2010 05:23:32 +0000 Subject: hg: jdk7/tl/jaxws: Added tag jdk7-b85 for changeset 8424512588ff Message-ID: <20100319052332.21C08441CB@hg.openjdk.java.net> Changeset: 512b0e924a5a Author: mikejwre Date: 2010-03-04 13:50 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jaxws/rev/512b0e924a5a Added tag jdk7-b85 for changeset 8424512588ff ! .hgtags From lana.steuck at sun.com Fri Mar 19 05:24:32 2010 From: lana.steuck at sun.com (lana.steuck at sun.com) Date: Fri, 19 Mar 2010 05:24:32 +0000 Subject: hg: jdk7/tl/jdk: 22 new changesets Message-ID: <20100319053136.DA606441CE@hg.openjdk.java.net> Changeset: 03cd9e62961f Author: mikejwre Date: 2010-03-04 13:50 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/03cd9e62961f Added tag jdk7-b85 for changeset b396584a3e64 ! .hgtags Changeset: 840601ac5ab7 Author: rkennke Date: 2010-03-03 15:50 +0100 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/840601ac5ab7 6892485: Deadlock in SunGraphicsEnvironment / FontManager Summary: Synchronize on correct monitor in SunFontManager. Reviewed-by: igor, prr ! src/share/classes/sun/font/SunFontManager.java Changeset: 1d7db2d5c4c5 Author: minqi Date: 2010-03-08 11:35 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/1d7db2d5c4c5 6918065: Crash in Java2D blit loop (IntArgbToIntArgbPreSrcOverMaskBlit) in 64bit mode Reviewed-by: igor, bae ! src/share/classes/java/awt/AlphaComposite.java + test/java/awt/AlphaComposite/TestAlphaCompositeForNaN.java Changeset: 494f5e4f24da Author: lana Date: 2010-03-09 15:26 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/494f5e4f24da Merge Changeset: e64331144648 Author: rupashka Date: 2010-02-10 15:15 +0300 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/e64331144648 6848475: JSlider does not display the correct value of its BoundedRangeModel Reviewed-by: peterz ! src/share/classes/javax/swing/plaf/basic/BasicSliderUI.java + test/javax/swing/JSlider/6848475/bug6848475.java Changeset: f81c8041ccf4 Author: peytoia Date: 2010-02-11 15:58 +0900 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/f81c8041ccf4 6909002: Remove indicim.jar and thaiim.jar from JRE and move to samples if needed Reviewed-by: okutsu ! make/com/sun/Makefile Changeset: e2b58a45a426 Author: peytoia Date: 2010-02-12 14:38 +0900 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/e2b58a45a426 6921289: (tz) Support tzdata2010b Reviewed-by: okutsu ! make/sun/javazic/tzdata/VERSION ! make/sun/javazic/tzdata/antarctica ! make/sun/javazic/tzdata/asia ! make/sun/javazic/tzdata/australasia ! make/sun/javazic/tzdata/europe ! make/sun/javazic/tzdata/northamerica ! make/sun/javazic/tzdata/zone.tab ! src/share/classes/sun/util/resources/TimeZoneNames.java ! src/share/classes/sun/util/resources/TimeZoneNames_de.java ! src/share/classes/sun/util/resources/TimeZoneNames_es.java ! src/share/classes/sun/util/resources/TimeZoneNames_fr.java ! src/share/classes/sun/util/resources/TimeZoneNames_it.java ! src/share/classes/sun/util/resources/TimeZoneNames_ja.java ! src/share/classes/sun/util/resources/TimeZoneNames_ko.java ! src/share/classes/sun/util/resources/TimeZoneNames_sv.java ! src/share/classes/sun/util/resources/TimeZoneNames_zh_CN.java ! src/share/classes/sun/util/resources/TimeZoneNames_zh_TW.java Changeset: e8340332745e Author: malenkov Date: 2010-02-18 17:46 +0300 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/e8340332745e 4498236: RFE: Provide a toString method for PropertyChangeEvent and other classes Reviewed-by: peterz ! src/share/classes/java/beans/BeanDescriptor.java ! src/share/classes/java/beans/EventSetDescriptor.java ! src/share/classes/java/beans/FeatureDescriptor.java ! src/share/classes/java/beans/IndexedPropertyChangeEvent.java ! src/share/classes/java/beans/IndexedPropertyDescriptor.java ! src/share/classes/java/beans/MethodDescriptor.java ! src/share/classes/java/beans/PropertyChangeEvent.java ! src/share/classes/java/beans/PropertyDescriptor.java + test/java/beans/Introspector/Test4498236.java Changeset: 5c03237838e1 Author: rupashka Date: 2010-02-27 14:26 +0300 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/5c03237838e1 6913758: Specification for SynthViewportUI.paintBorder(...) should mention that this method is never called Reviewed-by: peterz ! src/share/classes/javax/swing/plaf/synth/SynthViewportUI.java Changeset: 96205ed1b196 Author: rupashka Date: 2010-02-27 14:47 +0300 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/96205ed1b196 6918447: SynthToolBarUI.setBorderToXXXX() methods don't correspond inherited spec. They do nothing. Reviewed-by: peterz ! src/share/classes/javax/swing/plaf/synth/SynthToolBarUI.java Changeset: 621e921a14cd Author: rupashka Date: 2010-02-27 15:09 +0300 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/621e921a14cd 6918861: SynthSliderUI.uninstallDefaults() is not called when UI is uninstalled Reviewed-by: malenkov ! src/share/classes/javax/swing/plaf/basic/BasicSliderUI.java ! src/share/classes/javax/swing/plaf/synth/SynthSliderUI.java + test/javax/swing/JSlider/6918861/bug6918861.java Changeset: 28741de0bb4a Author: rupashka Date: 2010-02-27 16:03 +0300 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/28741de0bb4a 6923305: SynthSliderUI paints the slider track when the slider's "paintTrack" property is set to false Reviewed-by: alexp ! src/share/classes/javax/swing/plaf/synth/SynthSliderUI.java + test/javax/swing/JSlider/6923305/bug6923305.java Changeset: 2bf137beb9bd Author: rupashka Date: 2010-02-27 16:14 +0300 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/2bf137beb9bd 6929298: The SynthSliderUI#calculateTickRect method should be removed Reviewed-by: peterz ! src/share/classes/javax/swing/plaf/synth/SynthSliderUI.java Changeset: d6b3a07c8752 Author: rupashka Date: 2010-03-03 17:57 +0300 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/d6b3a07c8752 6924059: SynthScrollBarUI.configureScrollBarColors() should have spec different from the overridden method Reviewed-by: peterz ! src/share/classes/javax/swing/plaf/synth/SynthScrollBarUI.java + test/javax/swing/JScrollBar/6924059/bug6924059.java Changeset: 30c520bd148f Author: rupashka Date: 2010-03-03 20:08 +0300 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/30c520bd148f 6913768: With default SynthLookAndFeel instance installed new JTable creation leads to throwing NPE Reviewed-by: peterz ! src/share/classes/javax/swing/JTable.java ! src/share/classes/javax/swing/plaf/synth/SynthTableUI.java + test/javax/swing/JTable/6913768/bug6913768.java Changeset: f13fc955be62 Author: rupashka Date: 2010-03-03 20:53 +0300 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/f13fc955be62 6917744: JScrollPane Page Up/Down keys do not handle correctly html tables with different cells contents Reviewed-by: peterz, alexp ! src/share/classes/javax/swing/text/DefaultEditorKit.java + test/javax/swing/JEditorPane/6917744/bug6917744.java + test/javax/swing/JEditorPane/6917744/test.html Changeset: 0622086d82ac Author: malenkov Date: 2010-03-04 21:17 +0300 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/0622086d82ac 6921644: XMLEncoder generates invalid XML Reviewed-by: peterz ! src/share/classes/java/beans/XMLEncoder.java + test/java/beans/XMLEncoder/Test5023550.java + test/java/beans/XMLEncoder/Test5023557.java + test/java/beans/XMLEncoder/Test6921644.java Changeset: 79a509ac8f35 Author: lana Date: 2010-03-01 18:30 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/79a509ac8f35 Merge ! make/com/sun/Makefile - make/java/text/FILES_java.gmk Changeset: 90248595ec35 Author: lana Date: 2010-03-04 13:07 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/90248595ec35 Merge Changeset: 2fe4e72288ce Author: lana Date: 2010-03-09 15:28 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/2fe4e72288ce Merge Changeset: eae6e9ab2606 Author: lana Date: 2010-03-09 15:29 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/eae6e9ab2606 Merge - test/java/nio/file/WatchService/OverflowEventIsLoner.java Changeset: dff4f51b73d4 Author: lana Date: 2010-03-18 18:52 -0700 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/dff4f51b73d4 Merge From lana.steuck at sun.com Fri Mar 19 05:37:45 2010 From: lana.steuck at sun.com (lana.steuck at sun.com) Date: Fri, 19 Mar 2010 05:37:45 +0000 Subject: hg: jdk7/tl/langtools: 3 new changesets Message-ID: <20100319053756.31267441D1@hg.openjdk.java.net> Changeset: b816baf594e3 Author: mikejwre Date: 2010-03-04 13:50 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/b816baf594e3 Added tag jdk7-b85 for changeset 136bfc679462 ! .hgtags Changeset: ef07347428f2 Author: lana Date: 2010-03-09 15:29 -0800 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/ef07347428f2 Merge - test/tools/javac/treepostests/TreePosTest.java Changeset: 6fad35d25b1e Author: lana Date: 2010-03-18 18:52 -0700 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/6fad35d25b1e Merge From christopher.hegarty at sun.com Fri Mar 19 13:27:27 2010 From: christopher.hegarty at sun.com (christopher.hegarty at sun.com) Date: Fri, 19 Mar 2010 13:27:27 +0000 Subject: hg: jdk7/tl/jdk: 6935233: java/net/ServerSocket/AcceptCauseFileDescriptorLeak.java fails with modules build Message-ID: <20100319132803.016AC44258@hg.openjdk.java.net> Changeset: 3bb93c410f41 Author: chegar Date: 2010-03-19 13:07 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/3bb93c410f41 6935233: java/net/ServerSocket/AcceptCauseFileDescriptorLeak.java fails with modules build Reviewed-by: alanb ! test/ProblemList.txt ! test/java/net/ServerSocket/AcceptCauseFileDescriptorLeak.java + test/java/net/ServerSocket/AcceptCauseFileDescriptorLeak.sh From Xueming.Shen at Sun.COM Fri Mar 19 19:56:46 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Fri, 19 Mar 2010 11:56:46 -0800 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003161500o3d41e3felb2ab619f27095082@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <1ccfd1c11003161500o3d41e3felb2ab619f27095082@mail.gmail.com> Message-ID: <4BA3D6FE.1010800@sun.com> Martin Buchholz wrote: > I renamed my patch file from isSupplementaryCodePoint to isValidCodePoint. > > 6934268: Better implementation of Character.isValidCodePoint > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isValidCodePoint > It's fine. But if I was you I would not "optimize" it. > 6934265: Add public method Character.isBMPCodePoint > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint > Looks fine. I will let you know when the CCC is approved. > 6934270: Remove javac warnings from Character.java > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings > Looks fine. > 6934271: Better handling of longer utf-8 sequences > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/utf8-twiddling > Looks good, though the code style looks really really...strange:-) > 6935172: Optimize bit-twiddling in Bits.java > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Bits.java > Looks fine. I was surprised the javac compiler really generates the code for expr + 0 and expr << 0. I kinda remember the gcc compiler cat optimize this kind situation to just expr (If my memory is correct, or maybe that was kinda of optimization I was planning to do in one of my projects :-) ). -Sherman > Martin > > On Tue, Mar 16, 2010 at 15:35, Xueming Shen wrote: > >> Martin Buchholz wrote: >> >>> On Tue, Mar 16, 2010 at 13:06, Xueming Shen wrote: >>> >>> >>>> Martin Buchholz wrote: >>>> >>>> >>>>> Therefore the existing implementation >>>>> >>>>> >>>>>>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>>>>> && codePoint<= MAX_CODE_POINT; >>>>>>> >>>>>>> will almost always perform just one comparison against a constant, >>>>>>> which is hard to beat. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> 1. Wondering: I think there are TWO comparisons. >>>>>> 2. Those comparisons need to load 32 bit values from machine code, >>>>>> against >>>>>> only 8 bit values in my case. >>>>>> >>>>>> >>>>>> >>>>> It's a good point. In the machine code, shifts are likely to use >>>>> immediate values, and so will be a small win. >>>>> >>>>> int x = codePoint >>> 16; >>>>> return x != 0 && x < 0x11; >>>>> >>>>> (On modern hardware, these optimizations >>>>> are less valuable than they used to be; >>>>> ordinary integer arithmetic is almost free) >>>>> >>>>> >>>>> >>>>> >>>> I'm not convinced if the proposed code is really better...a "small win". >>>> >>>> >>> The primary theory here is that branches are expensive, >>> and we are reducing them by one. >>> >>> >>> >> There are still two branches in new impl, if you count the "ifeq" and >> "if_icmpge"(?) >> >> We are trying to "optimize" this piece of code with the assumption that the >> new impl MIGHT help certain vm (hotspot?) >> to optimize certain use scenario (some consecutive usages), if the compiler >> and/or the vm are both smart enough at certain >> point, with no supporting benchmark data? >> >> My concern is that the reality might be that this optimization might even >> hurt the BMP use >> case (the majority of the possible real world use scenarios) with a 10% >> bigger bytecode size. >> >> -Sherman >> >> >> >> public class Character extends java.lang.Object { >> public static final int MIN_SUPPLEMENTARY_CODE_POINT = 65536; >> >> public static final int MAX_CODE_POINT = 1114111; >> >> public Character(); >> Code: >> 0: aload_0 1: invokespecial #1 // Method >> java/lang/Object."":()V >> 4: return >> public static boolean isSupplementaryCodePoint(int); >> Code: >> 0: iload_0 1: ldc #2 // int >> 65536 >> 3: if_icmplt 16 >> 6: iload_0 7: ldc #3 // int >> 1114111 >> 9: if_icmpgt 16 >> 12: iconst_1 13: goto 17 >> 16: iconst_0 17: ireturn >> public static boolean isSupplementaryCodePoint_new(int); >> Code: >> 0: iload_0 1: bipush 16 >> 3: iushr 4: istore_1 >> 5: iload_1 6: ifeq 19 >> 9: iload_1 10: bipush 17 >> 12: if_icmpge 19 >> 15: iconst_1 16: goto 20 >> 19: iconst_0 20: ireturn } >> >> >> >> >> >> >> From Ulf.Zibis at gmx.de Fri Mar 19 20:29:37 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Fri, 19 Mar 2010 21:29:37 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003170846n19c3e273v71daff3a755c58a4@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA00564.4010104@gmx.de> <4BA09478.90809@gmx.de> <4BA09CDC.1040402@gmx.de> <1ccfd1c11003170846n19c3e273v71daff3a755c58a4@mail.gmail.com> Message-ID: <4BA3DEB1.9080303@gmx.de> Am 17.03.2010 16:46, schrieb Martin Buchholz: > On Wed, Mar 17, 2010 at 01:11, Ulf Zibis wrote: > >> Am I mad ??? >> >> 2nd. correction: >> >> But >> for (int i = offset; i< offset + count; i++) { >> int c = codePoints[i]; >> char plane = (char)(c>>> 16); >> if (plane == 0) >> n += 1; >> else if (plane< 0x11) >> n += 2; >> else throw new IllegalArgumentException(Integer.toString(c)); >> } >> has too only 2 branches and additionally could benefit from tiny 16-bit >> comparisons. >> The shift additionally could be omitted on CPU's which can benefit from >> 6933327. >> > I'm not a x86 or hotspot expert, but I would think that the "plane" > variable is never written to memory, but lives only in a register, > so I see only drawbacks to making plane a "char". > The char is not important here, maybe give hotspot a hint that value is always positive 16-bit. My idea was to indicate this to the reader. I saw, that you use to set a space after casts, why? Cast is a one-operand operator like - -- ++. This a rare style in the JDK sources which "disturbs" my eyes. ;-) -Ulf From martinrb at google.com Fri Mar 19 20:47:22 2010 From: martinrb at google.com (Martin Buchholz) Date: Fri, 19 Mar 2010 13:47:22 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA3DEB1.9080303@gmx.de> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA00564.4010104@gmx.de> <4BA09478.90809@gmx.de> <4BA09CDC.1040402@gmx.de> <1ccfd1c11003170846n19c3e273v71daff3a755c58a4@mail.gmail.com> <4BA3DEB1.9080303@gmx.de> Message-ID: <1ccfd1c11003191347p1f633502kc8ab34ef119f2a29@mail.gmail.com> On Fri, Mar 19, 2010 at 13:29, schrieb Ulf Zibis : > Am 17.03.2010 16:46, schrieb Martin Buchholz: > The char is not important here, maybe give hotspot a hint that value is > always positive 16-bit. My idea was to indicate this to the reader. I think naming the variable "plane" and using the ">>>" operator do a good job of making this hint to the reader. > > I saw, that you use to set a space after casts, why? Cast is a one-operand > operator like - -- ++. This a rare style in the JDK sources which "disturbs" > my eyes. ;-) The JDK code I have maintained uses space after cast. We don't have a really well-maintained coding standard, but the closest thing we do have agrees with me: http://java.sun.com/docs/codeconv/html/CodeConventions.doc7.html#475 Nevertheless, you are right - I was surprised that space after cast is less popular in the JDK sources. Martin From Ulf.Zibis at gmx.de Fri Mar 19 21:27:00 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Fri, 19 Mar 2010 22:27:00 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA10BEC.8050105@sun.com> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA00564.4010104@gmx.de> <4BA09478.90809@gmx.de> <4BA09CDC.1040402@gmx.de> <1ccfd1c11003170846n19c3e273v71daff3a755c58a4@mail.gmail.com> <4BA10BEC.8050105@sun.com> Message-ID: <4BA3EC24.7010809@gmx.de> Am 17.03.2010 18:05, schrieb Xueming Shen: > Martin Buchholz wrote: >> On Wed, Mar 17, 2010 at 01:11, Ulf Zibis wrote: >>> Am I mad ??? >>> >>> 2nd. correction: >>> >>> But >>> for (int i = offset; i < offset + count; i++) { >>> int c = codePoints[i]; >>> char plane = (char)(c >>> 16); >>> if (plane == 0) >>> n += 1; >>> else if (plane < 0x11) >>> n += 2; >>> else throw new >>> IllegalArgumentException(Integer.toString(c)); >>> } >>> has too only 2 branches and additionally could benefit from tiny 16-bit >>> comparisons. >>> The shift additionally could be omitted on CPU's which can benefit from >>> 6933327. >> >> I'm not a x86 or hotspot expert, but I would think that the "plane" >> variable is never written to memory, but lives only in a register, >> so I see only drawbacks to making plane a "char". >> > I doubt there is any benefit to use a 8-bit or 16-bit operand on a > 32-bit/64-bit machine. > While optimization is definitely good, but it might not be a good > habit to code in high-level > program language while thinking in assembly every each minute:-) > let's leave those > optimization to hotspot engineer:-) Yes, you are right. Unfortunately they are on delay with such things. As I said, the "(c >>> 16) == 0"-trick will loose it's justification, if JIT would be so smart, to convert a range check into a single unsigned compare. -Ulf > In this particular case, given most application will never use > supplementary character, I > doubt it really worth the optimization and I would definitely not try > to change the impl > of isSupplementaryCP to make this code "better". If you really really > want to optimize > this code the alternative is to have a package private > Character.getPlane(), or simply > to use your optimized code above. I would suggest to use int for > plane, instead of char or > byte. > > -Sherman > > > > > From Ulf.Zibis at gmx.de Fri Mar 19 21:46:29 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Fri, 19 Mar 2010 22:46:29 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA007A4.2030907@sun.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> Message-ID: <4BA3F0B5.1070404@gmx.de> Am 16.03.2010 23:35, schrieb Xueming Shen: > Martin Buchholz wrote: >> >> The primary theory here is that branches are expensive, >> and we are reducing them by one. >> > > There are still two branches in new impl, if you count the "ifeq" and > "if_icmpge"(?) > > We are trying to "optimize" this piece of code with the assumption > that the new impl MIGHT help certain vm (hotspot?) > to optimize certain use scenario (some consecutive usages), if the > compiler and/or the vm are both smart enough at certain > point, with no supporting benchmark data? I've finished the benchmark: https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/branches/JDK-7/j_l_Character_charCount/src/java/lang/CharacterBenchmark.java?rev=1006&view=log The results: time1: 2316,213 ms ..? la Martin time2: 1267,063 ms time3: 1245,972 ms ..using isValidCodePoint time4: 1467,570 ms ..validate version (slower, because of unreasonable HotSpot optimizing, see "C2 optimization bug ?" in hotspot-compiler-dev list) Here see the disassembly snippets: https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/branches/JDK-7/j_l_Character_charCount/log/PA_Character_compare.txt?rev=1007&view=markup Detailed: https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/branches/JDK-7/j_l_Character_charCount/log/PA_Character.xml?rev=1006&view=markup Little NetBeans project: https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/branches/JDK-7/j_l_Character_charCount/ Now I have two patches in my mq queue. Martin, how do I create 2 exports in the form, you would like? Should I use hg export with some magic option? -Ulf From martinrb at google.com Sat Mar 20 00:13:13 2010 From: martinrb at google.com (Martin Buchholz) Date: Fri, 19 Mar 2010 17:13:13 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA3F0B5.1070404@gmx.de> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> Message-ID: <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> Interesting benchmark results! Your microbenchmark technique looks unusual, but seems to work. I'm surprised there is that much difference. I would take out the swallowing of Exception. --- Your data contains only supplementary characters, which we are assuming are very rare. So I don't consider speeding up such a benchmark very important, but.... We can do it for free by switching isSupplementaryCodePoint => isValidCodePoint, so why not? --- While checking this, I noticed that Character.toChars can be sped up by using our new isBMPCodePoint method (always optimize for BMP!) --- Here's the change I'm making on top of isBMPCodePoint: http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint2/ Ulf, please review. diff --git a/src/share/classes/java/lang/Character.java b/src/share/classes/java/lang/Character.java --- a/src/share/classes/java/lang/Character.java +++ b/src/share/classes/java/lang/Character.java @@ -3099,15 +3099,15 @@ * @since 1.5 */ public static int toChars(int codePoint, char[] dst, int dstIndex) { - if (codePoint < 0 || codePoint > MAX_CODE_POINT) { + if (isBMPCodePoint(codePoint)) { + dst[dstIndex] = (char) codePoint; + return 1; + } else if (isValidCodePoint(codePoint)) { + toSurrogates(codePoint, dst, dstIndex); + return 2; + } else { throw new IllegalArgumentException(); } - if (codePoint < MIN_SUPPLEMENTARY_CODE_POINT) { - dst[dstIndex] = (char) codePoint; - return 1; - } - toSurrogates(codePoint, dst, dstIndex); - return 2; } /** @@ -3127,15 +3127,15 @@ * @since 1.5 */ public static char[] toChars(int codePoint) { - if (codePoint < 0 || codePoint > MAX_CODE_POINT) { + if (isBMPCodePoint(codePoint)) { + return new char[] { (char) codePoint }; + } else if (isValidCodePoint(codePoint)) { + char[] result = new char[2]; + toSurrogates(codePoint, result, 0); + return result; + } else { throw new IllegalArgumentException(); } - if (codePoint < MIN_SUPPLEMENTARY_CODE_POINT) { - return new char[] { (char) codePoint }; - } - char[] result = new char[2]; - toSurrogates(codePoint, result, 0); - return result; } static void toSurrogates(int codePoint, char[] dst, int index) { diff --git a/src/share/classes/java/lang/String.java b/src/share/classes/java/lang/String.java --- a/src/share/classes/java/lang/String.java +++ b/src/share/classes/java/lang/String.java @@ -281,7 +281,7 @@ int c = codePoints[i]; if (Character.isBMPCodePoint(c)) n += 1; - else if (Character.isSupplementaryCodePoint(c)) + else if (Character.isValidCodePoint(c)) n += 2; else throw new IllegalArgumentException(Integer.toString(c)); } diff --git a/src/share/classes/sun/nio/cs/Surrogate.java b/src/share/classes/sun/nio/cs/Surrogate.java --- a/src/share/classes/sun/nio/cs/Surrogate.java +++ b/src/share/classes/sun/nio/cs/Surrogate.java @@ -294,7 +294,7 @@ dst.put((char)uc); error = null; return 1; - } else if (Character.isSupplementaryCodePoint(uc)) { + } else if (Character.isValidCodePoint(uc)) { if (dst.remaining() < 2) { error = CoderResult.OVERFLOW; return -1; @@ -338,7 +338,7 @@ da[dp] = (char)uc; error = null; return 1; - } else if (Character.isSupplementaryCodePoint(uc)) { + } else if (Character.isValidCodePoint(uc)) { if (dl - dp < 2) { error = CoderResult.OVERFLOW; return -1; Martin On Fri, Mar 19, 2010 at 14:46, Ulf Zibis wrote: > Am 16.03.2010 23:35, schrieb Xueming Shen: >> >> Martin Buchholz wrote: >>> >>> The primary theory here is that branches are expensive, >>> and we are reducing them by one. >>> >> >> There are still two branches in new impl, if you count the "ifeq" and >> "if_icmpge"(?) >> >> We are trying to "optimize" this piece of code with the assumption that >> the new impl MIGHT help certain vm (hotspot?) >> to optimize certain use scenario (some consecutive usages), if the >> compiler and/or the vm are both smart enough at certain >> point, with no supporting benchmark data? > > I've finished the benchmark: > https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/branches/JDK-7/j_l_Character_charCount/src/java/lang/CharacterBenchmark.java?rev=1006&view=log > > The results: > time1: 2316,213 ms ?..? la Martin > time2: 1267,063 ms > time3: 1245,972 ms ?..using isValidCodePoint > time4: 1467,570 ms ?..validate version ? (slower, because of unreasonable > HotSpot optimizing, see "C2 optimization bug ?" in hotspot-compiler-dev > list) > > Here see the disassembly snippets: > https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/branches/JDK-7/j_l_Character_charCount/log/PA_Character_compare.txt?rev=1007&view=markup > From martinrb at google.com Sat Mar 20 00:17:59 2010 From: martinrb at google.com (Martin Buchholz) Date: Fri, 19 Mar 2010 17:17:59 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA3F0B5.1070404@gmx.de> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> Message-ID: <1ccfd1c11003191717g5ff27fd1m100985fd780fabc4@mail.gmail.com> On Fri, Mar 19, 2010 at 14:46, Ulf Zibis wrote: > Am 16.03.2010 23:35, schrieb Xueming Shen: > Now I have two patches in my mq queue. > Martin, how do I create 2 exports in the form, you would like? Just copy the patch files to some public web-accessible place, as I do with cr.openjdk.java.net. It seems you do that with your java.net projects, but you should really put them on cr.openjdk.java.net, since it's designed for that purpose. Martin From kelly.ohair at sun.com Sat Mar 20 01:18:24 2010 From: kelly.ohair at sun.com (kelly.ohair at sun.com) Date: Sat, 20 Mar 2010 01:18:24 +0000 Subject: hg: jdk7/tl: 6936788: Minor adjustment to top repo test/Makefile, missing non-zero exit case Message-ID: <20100320011825.0F4AE44311@hg.openjdk.java.net> Changeset: 35d272ef7598 Author: ohair Date: 2010-03-19 18:17 -0700 URL: http://hg.openjdk.java.net/jdk7/tl/rev/35d272ef7598 6936788: Minor adjustment to top repo test/Makefile, missing non-zero exit case Reviewed-by: jjg ! test/Makefile From martinrb at google.com Sat Mar 20 05:01:07 2010 From: martinrb at google.com (Martin Buchholz) Date: Fri, 19 Mar 2010 22:01:07 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> Message-ID: <1ccfd1c11003192201s222a233fi6f045d4d0e88febb@mail.gmail.com> Here's another little improvement that should use isBMPCodePoint: diff --git a/src/share/classes/java/lang/AbstractStringBuilder.java b/src/share/classes/java/lang/AbstractStringBuilder.java --- a/src/share/classes/java/lang/AbstractStringBuilder.java +++ b/src/share/classes/java/lang/AbstractStringBuilder.java @@ -719,20 +719,17 @@ * {@code codePoint} isn't a valid Unicode code point */ public AbstractStringBuilder appendCodePoint(int codePoint) { - if (!Character.isValidCodePoint(codePoint)) { + if (Character.isBMPCodePoint(codePoint)) { + ensureCapacityInternal(count + 1); + value[count] = (char) codePoint; + count += 1; + } else if (Character.isValidCodePoint(codePoint)) { + ensureCapacityInternal(count + 2); + Character.toSurrogates(codePoint, value, count); + count += 2; + } else { throw new IllegalArgumentException(); } - int n = 1; - if (codePoint >= Character.MIN_SUPPLEMENTARY_CODE_POINT) { - n++; - } - ensureCapacityInternal(count + n); - if (n == 1) { - value[count++] = (char) codePoint; - } else { - Character.toSurrogates(codePoint, value, count); - count += n; - } return this; } Martin From martinrb at google.com Sat Mar 20 18:36:22 2010 From: martinrb at google.com (Martin Buchholz) Date: Sat, 20 Mar 2010 11:36:22 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate Message-ID: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> For a change, here's an actual plain old "incorrect result" bug fix for String.lastIndexOf Sherman, please file a bug and review. http://cr.openjdk.java.net/~martin/webrevs/openjdk7/lastIndexOf/ Also includes our usual performance-oriented fiddling. public class LastIndexOf { public static void main(String[] args) { int ch = 0x10042; char[] bug = new char[3]; Character.toChars(ch, bug, 0); bug[2] = bug[0]; System.out.println(new String(bug).lastIndexOf(ch)); bug[2] = '!'; System.out.println(new String(bug).lastIndexOf(ch)); } } ==> javac -source 1.6 -Xlint:all LastIndexOf.java ==> java -esa -ea LastIndexOf -1 0 From Ulf.Zibis at gmx.de Sat Mar 20 21:52:32 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sat, 20 Mar 2010 22:52:32 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> Message-ID: <4BA543A0.2060600@gmx.de> Am 20.03.2010 01:13, schrieb Martin Buchholz: > Interesting benchmark results! > > Your microbenchmark technique looks unusual, but seems to work. > - yes, warmup is integrated without need for coding extra loop -- here the more sophisticated version to detect slowdowns caused by GC, Hotspot or OS activity (line 245 ...): https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/branches/j7_EUC_TW/src/sun/nio/cs/ext/EUC_TWBenchmark.java?rev=1008&view=markup - my technique mostly eliminates influences of irregular system slowdowns -- this is of special importance on mobile systems, where CPU clock could become throttled caused from overheating > I'm surprised there is that much difference. > I wasn't after studying the disassemblies. > I would take out the swallowing of Exception. > Thanx, caused by copy'n paste. ;-) > --- > > Your data contains only supplementary characters, > which we are assuming are very rare. > Yes, on BMP characters all variations have same speed. > So I don't consider speeding up such a benchmark > very important, Yes, even on taiwanese machines, using EUC_TW, surrogates frequency may be 1 %, but character processing at all should be frequent on several applications. On the other hand, frequency of other APIs, e.g. array sort, even should be low on overall applications. Does that justify stepmotherly maintained code, in particular if we can have it for more or less nothing. > but.... > > We can do it for free > by switching isSupplementaryCodePoint => isValidCodePoint, > so why not? > Yep, I stepmotherly revised other methods, my focus was on String(int[], int, int) and outsourcing bond checks. BTW, what do you think of the latter? > --- > > While checking this, I noticed that Character.toChars can > be sped up by using our new isBMPCodePoint method > (always optimize for BMP!) > I guess you have noticed, that the main change I just have done earlier. But I couldn't imagine, that we would drop the optimized form of isSupplementaryCodePoint(). ;-) > --- > > Here's the change I'm making on top of isBMPCodePoint: > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint2/ > > Ulf, please review. > - have you noticed my change on toSurrogates() in returning the count? Useful too in AbstractStringBuilder#appendCodePoint() and plenty of sun.nio.cs decoders. - A little "bug" in javadoc: @exception ArrayIndexOutOfBoundsException instead IndexOutOfBoundsException - String#indexOf(int, int): // handle supplementary characters here char high = Character.highSurrogate(ch); char low = Character.lowSurrogate(ch); for ( ; i < max-1; i++) if (v[i] == high && v[i+1] == low) return i - offset; -Ulf From Ulf.Zibis at gmx.de Sat Mar 20 22:05:53 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sat, 20 Mar 2010 23:05:53 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> Message-ID: <4BA546C1.5040201@gmx.de> Good catch! Additionally consider my additional twiddling on indexOf. -Ulf Am 20.03.2010 19:36, schrieb Martin Buchholz: > For a change, here's an actual plain old "incorrect result" bug fix > for String.lastIndexOf > > Sherman, please file a bug and review. > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/lastIndexOf/ > > Also includes our usual performance-oriented fiddling. > > public class LastIndexOf { > public static void main(String[] args) { > int ch = 0x10042; > char[] bug = new char[3]; > Character.toChars(ch, bug, 0); > bug[2] = bug[0]; > System.out.println(new String(bug).lastIndexOf(ch)); > bug[2] = '!'; > System.out.println(new String(bug).lastIndexOf(ch)); > } > } > ==> javac -source 1.6 -Xlint:all LastIndexOf.java > ==> java -esa -ea LastIndexOf > -1 > 0 > > > From Ulf.Zibis at gmx.de Sat Mar 20 22:50:49 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sat, 20 Mar 2010 23:50:49 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA546C1.5040201@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA546C1.5040201@gmx.de> Message-ID: <4BA55149.5070603@gmx.de> Oops, later I looked in your webrev and saw your same idea at same time while I was composing my before-last email. Why don't you outsource indexOfBMP, lastIndexOfBMP, or to be sincere IMO to much source code + byte code overhead for a only once used 3-liner. I suspect if all the finals will have any benefit. Some time ago I too felt in that trap, or am I wrong. Examine the disassambly. -Ulf Am 20.03.2010 23:05, schrieb Ulf Zibis: > Good catch! > Additionally consider my additional twiddling on indexOf. > > -Ulf > > > Am 20.03.2010 19:36, schrieb Martin Buchholz: >> For a change, here's an actual plain old "incorrect result" bug fix >> for String.lastIndexOf >> >> Sherman, please file a bug and review. >> >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/lastIndexOf/ >> >> Also includes our usual performance-oriented fiddling. >> >> public class LastIndexOf { >> public static void main(String[] args) { >> int ch = 0x10042; >> char[] bug = new char[3]; >> Character.toChars(ch, bug, 0); >> bug[2] = bug[0]; >> System.out.println(new String(bug).lastIndexOf(ch)); >> bug[2] = '!'; >> System.out.println(new String(bug).lastIndexOf(ch)); >> } >> } >> ==> javac -source 1.6 -Xlint:all LastIndexOf.java >> ==> java -esa -ea LastIndexOf >> -1 >> 0 >> >> > > From Ulf.Zibis at gmx.de Sat Mar 20 23:50:30 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 00:50:30 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003191717g5ff27fd1m100985fd780fabc4@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191717g5ff27fd1m100985fd780fabc4@mail.gmail.com> Message-ID: <4BA55F46.7010501@gmx.de> Am 20.03.2010 01:17, schrieb Martin Buchholz: > On Fri, Mar 19, 2010 at 14:46, Ulf Zibis wrote: > >> Now I have two patches in my mq queue. >> Martin, how do I create 2 exports in the form, you would like? >> > Just copy the patch files to some public web-accessible place, > as I do with cr.openjdk.java.net. Just looked into .hg\patches. Didn't imagine, that things could be so simple. Would be happy to have access to cr.openjdk.java.net. -Ulf From Ulf.Zibis at gmx.de Sun Mar 21 00:13:47 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 01:13:47 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> Message-ID: <4BA564BB.9090901@gmx.de> Am 20.03.2010 01:13, schrieb Martin Buchholz: > We can do it for free > by switching => isValidCodePoint, > so why not? > Don't you think we should add a hint to javadoc to inform the user about the implementation difference between isSupplementaryCodePoint and isValidCodePoint? It's likely, the user would use isBMPCodePoint and isSupplementaryCodePoint as pair, not knowing about the performance problem. But more I would like to see 6932837 && 6935994 gets fixed, so we could stay on conservative style. -Ulf From Ulf.Zibis at gmx.de Sun Mar 21 00:18:15 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 01:18:15 +0100 Subject: Centralize bounds check in package private methods Message-ID: <4BA565C7.9070103@gmx.de> What do you think about? See attachment. -Ulf -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: String_checkBounds URL: From Ulf.Zibis at gmx.de Sun Mar 21 00:24:41 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 01:24:41 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> Message-ID: <4BA56749.8020506@gmx.de> Sherman, please again consider about shifting Surrogate.high/low to Character.high/lowSurrogate. -Ulf Am 20.03.2010 19:36, schrieb Martin Buchholz: > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/lastIndexOf/ > > > From Ulf.Zibis at gmx.de Sun Mar 21 00:30:59 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 01:30:59 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003192201s222a233fi6f045d4d0e88febb@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <1ccfd1c11003192201s222a233fi6f045d4d0e88febb@mail.gmail.com> Message-ID: <4BA568C3.8040705@gmx.de> Fast path for BMP. +1 I don't find ensureCapacityInternal() ??? shorter: public AbstractStringBuilder appendCodePoint(int codePoint) { int count = this.count; if (Character.isBMPCodePoint(codePoint)) { ensureCapacityInternal(count + 1); value[count++] = (char) codePoint; } else if (Character.isValidCodePoint(codePoint)) { ensureCapacityInternal(count + 2); count += Character.toSurrogates(codePoint, value, count); } else throw new IllegalArgumentException(); return this; } -Ulf Am 20.03.2010 06:01, schrieb Martin Buchholz: > Here's another little improvement that should use isBMPCodePoint: > > diff --git a/src/share/classes/java/lang/AbstractStringBuilder.java > b/src/share/classes/java/lang/AbstractStringBuilder.java > --- a/src/share/classes/java/lang/AbstractStringBuilder.java > +++ b/src/share/classes/java/lang/AbstractStringBuilder.java > @@ -719,20 +719,17 @@ > * {@code codePoint} isn't a valid Unicode code point > */ > public AbstractStringBuilder appendCodePoint(int codePoint) { > - if (!Character.isValidCodePoint(codePoint)) { > + if (Character.isBMPCodePoint(codePoint)) { > + ensureCapacityInternal(count + 1); > + value[count] = (char) codePoint; > + count += 1; > + } else if (Character.isValidCodePoint(codePoint)) { > + ensureCapacityInternal(count + 2); > + Character.toSurrogates(codePoint, value, count); > + count += 2; > + } else { > throw new IllegalArgumentException(); > } > - int n = 1; > - if (codePoint>= Character.MIN_SUPPLEMENTARY_CODE_POINT) { > - n++; > - } > - ensureCapacityInternal(count + n); > - if (n == 1) { > - value[count++] = (char) codePoint; > - } else { > - Character.toSurrogates(codePoint, value, count); > - count += n; > - } > return this; > } > > > Martin > > > From martinrb at google.com Sun Mar 21 07:14:44 2010 From: martinrb at google.com (Martin Buchholz) Date: Sun, 21 Mar 2010 00:14:44 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA568C3.8040705@gmx.de> References: <4A95079A.8080803@gmx.de> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <1ccfd1c11003192201s222a233fi6f045d4d0e88febb@mail.gmail.com> <4BA568C3.8040705@gmx.de> Message-ID: <1ccfd1c11003210014q5dcb6d6fj6e4bfc0975d6e98c@mail.gmail.com> On Sat, Mar 20, 2010 at 17:30, Ulf Zibis wrote: > Fast path for BMP. +1 > > I don't find ensureCapacityInternal() ??? My patches are dependent on each other. I've changed my patch publishing script to publish my entire .hg/patches/ subrepo, so that others can import my patches, including their order. http://cr.openjdk.java.net/~martin/webrevs/openjdk7/patches/ > shorter: > ? ?public AbstractStringBuilder appendCodePoint(int codePoint) { > ? ? ? ?int count = this.count; You need to update this.count, not count, below. To avoid this very common class of bugs, I use final when I cache a field in a local of the same name. Anyways, I adopted your caching of count, in isBMPCodePoint2. > ? ? ? ?if (Character.isBMPCodePoint(codePoint)) { > ? ? ? ? ? ?ensureCapacityInternal(count + 1); > ? ? ? ? ? ?value[count++] = (char) codePoint; > ? ? ? ?} else if (Character.isValidCodePoint(codePoint)) { > ? ? ? ? ? ?ensureCapacityInternal(count + 2); > ? ? ? ? ? ?count += Character.toSurrogates(codePoint, value, count); I'm sorry, I dislike methods that always return the same value, just to make some client code a little shorter. Character.toSurrogates should return void. Martin From martinrb at google.com Sun Mar 21 07:20:20 2010 From: martinrb at google.com (Martin Buchholz) Date: Sun, 21 Mar 2010 00:20:20 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA564BB.9090901@gmx.de> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA564BB.9090901@gmx.de> Message-ID: <1ccfd1c11003210020l1c200f89h95541d120fdc08cb@mail.gmail.com> On Sat, Mar 20, 2010 at 17:13, Ulf Zibis wrote: > Am 20.03.2010 01:13, schrieb Martin Buchholz: > Don't you think we should add a hint to javadoc to inform the user about the > implementation difference between isSupplementaryCodePoint and > isValidCodePoint? No. > It's likely, the user would use isBMPCodePoint and isSupplementaryCodePoint > as pair, not knowing about the performance problem. I don't think it's a performance problem in the real world. We don't usually put such performance information in the javadoc. Can you demonstrate a performance advantage of your implementation of isSupplementaryCodePoint for BMP characters, when there is no call to isBMPCodePoint? (Such a demonstration typically requires testing on a large variety of systems and JITs) Martin From martinrb at google.com Sun Mar 21 07:24:02 2010 From: martinrb at google.com (Martin Buchholz) Date: Sun, 21 Mar 2010 00:24:02 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA55F46.7010501@gmx.de> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191717g5ff27fd1m100985fd780fabc4@mail.gmail.com> <4BA55F46.7010501@gmx.de> Message-ID: <1ccfd1c11003210024y2097a2acmfd0b729871f02397@mail.gmail.com> On Sat, Mar 20, 2010 at 16:50, Ulf Zibis wrote: > Am 20.03.2010 01:17, schrieb Martin Buchholz: >> >> On Fri, Mar 19, 2010 at 14:46, Ulf Zibis ?wrote: >> >>> >>> Now I have two patches in my mq queue. >>> Martin, how do I create 2 exports in the form, you would like? >>> >> >> Just copy the patch files to some public web-accessible place, >> as I do with cr.openjdk.java.net. > > Just looked into .hg\patches. > Didn't imagine, that things could be so simple. mercurial does try to have mechanisms as simple as possible. I really do think of my in-progress jdk development as being the set of patch files in the .hg/patches directory. Martin From martinrb at google.com Sun Mar 21 07:56:53 2010 From: martinrb at google.com (Martin Buchholz) Date: Sun, 21 Mar 2010 00:56:53 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA543A0.2060600@gmx.de> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> Message-ID: <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> On Sat, Mar 20, 2010 at 14:52, Ulf Zibis wrote: > Am 20.03.2010 01:13, schrieb Martin Buchholz: > Yep, I stepmotherly revised other methods, my focus was on String(int[], > int, int) and outsourcing bond checks. > BTW, what do you think of the latter? I've done a lot of work on "out-lining" error handling. The state of the art appears to be (from LinkedList) /** * Tells if the argument is the index of an existing element. */ private boolean isElementIndex(int index) { return index >= 0 && index < size; } /** * Tells if the argument is the index of a valid position for an * iterator or an add operation. */ private boolean isPositionIndex(int index) { return index >= 0 && index <= size; } /** * Constructs an IndexOutOfBoundsException detail message. * Of the many possible refactorings of the error handling code, * this "outlining" performs best with both server and client VMs. */ private String outOfBoundsMsg(int index) { return "Index: "+index+", Size: "+size; } private void checkElementIndex(int index) { if (!isElementIndex(index)) throw new IndexOutOfBoundsException(outOfBoundsMsg(index)); } private void checkPositionIndex(int index) { if (!isPositionIndex(index)) throw new IndexOutOfBoundsException(outOfBoundsMsg(index)); } Study also: http://code.google.com/p/google-collections/source/browse/trunk/src/com/google/common/base/Preconditions.java > - A little "bug" in javadoc: > ?@exception ArrayIndexOutOfBoundsException > ?instead ? ?IndexOutOfBoundsException Not a bug. You do realize AIOOBE is a subclass of IOOBE? > - String#indexOf(int, int): > ? ? ? ? ? ? // handle supplementary characters here > ? ? ? ? ? ? char high = Character.highSurrogate(ch); > ? ? ? ? ? ? char low = Character.lowSurrogate(ch); > ? ? ? ? ? ? for ( ; i < max-1; i++) > ? ? ? ? ? ? ? ? if (v[i] == high && v[i+1] == low) > ? ? ? ? ? ? ? ? ? ? ? ? return i - offset; I now believe we should provide Character.highSurrogate and Character.lowSurrogate as you have been advocating. If Sherman agrees, let's put a proper patch for this together. Martin From martinrb at google.com Sun Mar 21 08:05:52 2010 From: martinrb at google.com (Martin Buchholz) Date: Sun, 21 Mar 2010 01:05:52 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA55149.5070603@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA546C1.5040201@gmx.de> <4BA55149.5070603@gmx.de> Message-ID: <1ccfd1c11003210105s12220ffcrf139c2c59d33f84d@mail.gmail.com> On Sat, Mar 20, 2010 at 15:50, Ulf Zibis wrote: > Oops, later I looked in your webrev and saw your same idea at same time > while I was composing my before-last email. > > Why don't you outsource indexOfBMP, lastIndexOfBMP, or to be sincere IMO to > much source code + byte code overhead for a only once used 3-liner. I'm not sure I understand your intent. > I suspect if all the finals will have any benefit. Some time ago I too felt > in that trap, or am I wrong. Examine the disassambly. My use of "final" is almost always for software engineering reasons, not for performance reasons. Martin From martinrb at google.com Sun Mar 21 08:16:05 2010 From: martinrb at google.com (Martin Buchholz) Date: Sun, 21 Mar 2010 01:16:05 -0700 Subject: Centralize bounds check in package private methods In-Reply-To: <4BA565C7.9070103@gmx.de> References: <4BA565C7.9070103@gmx.de> Message-ID: <1ccfd1c11003210116u571a863ew8ca8d5104e4e8997@mail.gmail.com> This is definitely on the right track. We should borrow as many of the ideas and method names from LinkedList and Preconditions. But compatibility will force us to maintain our own versions of these methods. Someday I'd like to see much of google-collections/guava in the jdk. Particularly Preconditions. Martin On Sat, Mar 20, 2010 at 17:18, Ulf Zibis wrote: > What do you think about? > See attachment. > > -Ulf > > From Ulf.Zibis at gmx.de Sun Mar 21 10:00:39 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 11:00:39 +0100 Subject: Centralize bounds check in package private methods In-Reply-To: <1ccfd1c11003210116u571a863ew8ca8d5104e4e8997@mail.gmail.com> References: <4BA565C7.9070103@gmx.de> <1ccfd1c11003210116u571a863ew8ca8d5104e4e8997@mail.gmail.com> Message-ID: <4BA5EE47.6070001@gmx.de> That's great, gives me more time for other work. Preconditions seems good candidate for java.util package. Some namings appear little strange for me. Example: String errorMessageTemplate, Object... errorMessageArgs Why not just: errorFormat, errorArgs messageFormat, messageArgs Term 'format' is known from printf. Have you noticed the "creative" variations on messages in class AbstractStringBuilder? -Ulf Am 21.03.2010 09:16, schrieb Martin Buchholz: > This is definitely on the right track. > We should borrow as many of the ideas and > method names from LinkedList and Preconditions. > But compatibility will force us to maintain our own > versions of these methods. > > Someday I'd like to see much of > google-collections/guava in the jdk. > Particularly Preconditions. > > Martin > > On Sat, Mar 20, 2010 at 17:18, Ulf Zibis wrote: > >> What do you think about? >> See attachment. >> >> -Ulf >> >> >> > > From Ulf.Zibis at gmx.de Sun Mar 21 10:13:25 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 11:13:25 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> Message-ID: <4BA5F145.2020408@gmx.de> Am 21.03.2010 08:56, schrieb Martin Buchholz: > On Sat, Mar 20, 2010 at 14:52, Ulf Zibis wrote: > >> Am 20.03.2010 01:13, schrieb Martin Buchholz: >> Yep, I stepmotherly revised other methods, my focus was on String(int[], >> int, int) and outsourcing bond checks. >> BTW, what do you think of the latter? >> > I've done a lot of work on "out-lining" error handling. > The state of the art appears to be (from LinkedList) > Additionally in my brain stack there is cooking the idea, having final class AbstractString/AbstractCharsequence, where e.g. String and AbstractStringBuilder could inherit from. There are several methods, that could be centralized, and, if appropriate, subclassed. Several static methods of class Character could find their code home here. -Ulf From Ulf.Zibis at gmx.de Sun Mar 21 10:24:49 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 11:24:49 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <1ccfd1c11003210105s12220ffcrf139c2c59d33f84d@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA546C1.5040201@gmx.de> <4BA55149.5070603@gmx.de> <1ccfd1c11003210105s12220ffcrf139c2c59d33f84d@mail.gmail.com> Message-ID: <4BA5F3F1.8080609@gmx.de> Am 21.03.2010 09:05, schrieb Martin Buchholz: > On Sat, Mar 20, 2010 at 15:50, Ulf Zibis wrote: > >> Why don't you outsource indexOfBMP, lastIndexOfBMP, or to be sincere IMO to >> much source code + byte code overhead for a only once used 3-liner. >> > I'm not sure I understand your intent. > I think, we should not define a distinct method for this once-used 3-liner: for (; i < max-1; i++) if (v[i] == high && v[i+1] == low) return i - offset; HotSpots resources should not be over-stressed to inline such things, having more reserves for more important things. > >> I suspect if all the finals will have any benefit. Some time ago I too felt >> in that trap, or am I wrong. Examine the disassambly. >> > My use of "final" is almost always for software engineering reasons, > not for performance reasons. > Ah, ok, just a kind of coding style. -Ulf From Ulf.Zibis at gmx.de Sun Mar 21 10:29:04 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 11:29:04 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003210014q5dcb6d6fj6e4bfc0975d6e98c@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <1ccfd1c11003192201s222a233fi6f045d4d0e88febb@mail.gmail.com> <4BA568C3.8040705@gmx.de> <1ccfd1c11003210014q5dcb6d6fj6e4bfc0975d6e98c@mail.gmail.com> Message-ID: <4BA5F4F0.7040905@gmx.de> Am 21.03.2010 08:14, schrieb Martin Buchholz: > On Sat, Mar 20, 2010 at 17:30, Ulf Zibis wrote: > > > You need to update this.count, not count, below. > To avoid this very common class of bugs, > I use final when I cache a field in a local of the same name. Oops, is was late last night. -Ulf From Ulf.Zibis at gmx.de Sun Mar 21 11:28:57 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 12:28:57 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003210020l1c200f89h95541d120fdc08cb@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA564BB.9090901@gmx.de> <1ccfd1c11003210020l1c200f89h95541d120fdc08cb@mail.gmail.com> Message-ID: <4BA602F9.7000408@gmx.de> > > On Sat, Mar 20, 2010 at 17:13, Ulf Zibis wrote: > >> Am 20.03.2010 01:13, schrieb Martin Buchholz: >> Don't you think we should add a hint to javadoc to inform the user about the >> implementation difference between isSupplementaryCodePoint and >> isValidCodePoint? >> > No. > > >> It's likely, the user would use isBMPCodePoint and isSupplementaryCodePoint >> as pair, not knowing about the performance problem. >> > I don't think it's a performance problem in the real world. > Hm, if someone uses: if (Character.isBMPCodePoint(codePoint)) ...; else if (Character.isSupplementaryCodePoint(codePoint)) // instead isValidCodepoint() ...; else ...; he will loose up to 50 % performance as you can see on my benchmark on isSuppCPAlaMartin(). > We don't usually put such performance information in the javadoc. > In class StringBuilder: "Where possible, it is recommended that this class be used in preference to |StringBuffer| as it will be faster under most implementations." java.util.List: Note that these operations may execute in time proportional to the index value for some implementations (the LinkedList class, for example). ByteBuffer#get(byte[],int,int): In other words, an invocation of this method of the form src.get(dst, off, len) has exactly the same effect as the loop for (int i = off; i< off + len; i++) dst[i] = src.get(); except that it first checks that there are sufficient bytes in this buffer *and it is potentially much more efficient*. ** > Can you demonstrate a performance advantage > of your implementation of isSupplementaryCodePoint > for BMP characters, when there is no call to > isBMPCodePoint? (Such a demonstration typically > requires testing on a large variety of systems and JITs) > I'm not sure if I understand right. I think, my benchmark on isSuppCPAlaMartin() would demonstrate that. Anyway, even if isSupplementaryCodePoint() is used isolated, my code will help JIT to use 2-byte shifted adressing and shorter 2-byte immediate value for the compare, but yes, JIT should be able to catch that without this help. But for that case, we could stay on the old implementations too for isBMPCodePoint and is ValidCodePoint. -Ulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ulf.Zibis at gmx.de Sun Mar 21 12:00:00 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 13:00:00 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> Message-ID: <4BA60A40.9050600@gmx.de> > > On Sat, Mar 20, 2010 at 14:52, Ulf Zibis wrote: > >> - A little "bug" in javadoc: >> @exception ArrayIndexOutOfBoundsException >> instead IndexOutOfBoundsException >> > Not a bug. > Yes, but decreases the users capabilities catching exceptions more precise and flexible. Imagine, a method would throw an IndexOutOfBoundsException for some reason and too calls Character.toChars(). The caller of such a method could distinguish, where the exception would come from, and have separate catch blocks. But if not documented ... :-( In extreme, following too would not be a bug in your sense: @exception Exception I became sensitive on this, as I have seen real bugs in AbstractStringBuilder vice versa, where methods actually throw IndexOutOfBoundsExceptions, but their javadoc states StringIndexOutOf BoundsException. Would be a nice game for easter, inviting people to search for those bugs in JDK code base, than for coloured eggs. > You do realize AIOOBE is a subclass of IOOBE? > Yes. -Ulf From martinrb at google.com Sun Mar 21 12:35:04 2010 From: martinrb at google.com (Martin Buchholz) Date: Sun, 21 Mar 2010 05:35:04 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA5F3F1.8080609@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA546C1.5040201@gmx.de> <4BA55149.5070603@gmx.de> <1ccfd1c11003210105s12220ffcrf139c2c59d33f84d@mail.gmail.com> <4BA5F3F1.8080609@gmx.de> Message-ID: <1ccfd1c11003210535l1505a0bfoe9f14c6ad4b07c42@mail.gmail.com> On Sun, Mar 21, 2010 at 03:24, Ulf Zibis wrote: > Am 21.03.2010 09:05, schrieb Martin Buchholz: >> >> On Sat, Mar 20, 2010 at 15:50, Ulf Zibis ?wrote: > I think, we should not define a distinct method for this once-used 3-liner: > ? ? ? ? ? ? for (; i < max-1; i++) > ? ? ? ? ? ? ? ? if (v[i] == high && v[i+1] == low) > ? ? ? ? ? ? ? ? ? ? ? ? return i - offset; > > HotSpots resources should not be over-stressed to inline such things, having > more reserves for more important things. On the contrary - normally the above code snippet will rarely be executed, and so will normally not be inlined into the caller, which makes it easier for hotspot to inline the caller into its caller. Separate cold code into separate methods. BTW, in case you try to benchmark this, hotspot intrinsifies indexOf by default. Martin From Ulf.Zibis at gmx.de Sun Mar 21 12:53:34 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 13:53:34 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> Message-ID: <4BA616CE.3090003@gmx.de> > > diff --git a/src/share/classes/sun/nio/cs/Surrogate.java > b/src/share/classes/sun/nio/cs/Surrogate.java > --- a/src/share/classes/sun/nio/cs/Surrogate.java > +++ b/src/share/classes/sun/nio/cs/Surrogate.java > @@ -294,7 +294,7 @@ > dst.put((char)uc); > error = null; > return 1; > - } else if (Character.isSupplementaryCodePoint(uc)) { > + } else if (Character.isValidCodePoint(uc)) { > if (dst.remaining()< 2) { > error = CoderResult.OVERFLOW; > return -1; > @@ -338,7 +338,7 @@ > da[dp] = (char)uc; > error = null; > return 1; > - } else if (Character.isSupplementaryCodePoint(uc)) { > + } else if (Character.isValidCodePoint(uc)) { > if (dl - dp< 2) { > error = CoderResult.OVERFLOW; > return -1; > > Have you searched for usages of Surrogate.isNeededFor() and Character.isSupplementaryCodePoint() in sun.nio.cs.**.* and elsewhere? If used paired with *.isBMP.* it should be replaced by Surrogate.isBMP() / Character.isBMPCodePoint(). -Ulf From Ulf.Zibis at gmx.de Sun Mar 21 13:06:30 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 14:06:30 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA616CE.3090003@gmx.de> References: <4A95079A.8080803@gmx.de> <4A9578C4.8060801@sun.com> <4B8DA070.3040306@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA616CE.3090003@gmx.de> Message-ID: <4BA619D6.3080907@gmx.de> > >> >> diff --git a/src/share/classes/sun/nio/cs/Surrogate.java >> b/src/share/classes/sun/nio/cs/Surrogate.java >> --- a/src/share/classes/sun/nio/cs/Surrogate.java >> +++ b/src/share/classes/sun/nio/cs/Surrogate.java >> @@ -294,7 +294,7 @@ >> dst.put((char)uc); >> error = null; >> return 1; >> - } else if (Character.isSupplementaryCodePoint(uc)) { >> + } else if (Character.isValidCodePoint(uc)) { >> if (dst.remaining()< 2) { >> error = CoderResult.OVERFLOW; >> return -1; >> @@ -338,7 +338,7 @@ >> da[dp] = (char)uc; >> error = null; >> return 1; >> - } else if (Character.isSupplementaryCodePoint(uc)) { >> + } else if (Character.isValidCodePoint(uc)) { >> if (dl - dp< 2) { >> error = CoderResult.OVERFLOW; >> return -1; >> > > Have you searched for usages of Surrogate.isNeededFor() and > Character.isSupplementaryCodePoint() in sun.nio.cs.**.* and elsewhere? > If used paired with *.isBMP.* it should be replaced by > Surrogate.isBMP() / Character.isBMPCodePoint(). > > -Ulf correction: If used paired with *.isBMP.* it should be replaced by Surrogate.is() / Character.isValidCodePoint(). From Ulf.Zibis at gmx.de Sun Mar 21 13:16:48 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 14:16:48 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <1ccfd1c11003210535l1505a0bfoe9f14c6ad4b07c42@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA546C1.5040201@gmx.de> <4BA55149.5070603@gmx.de> <1ccfd1c11003210105s12220ffcrf139c2c59d33f84d@mail.gmail.com> <4BA5F3F1.8080609@gmx.de> <1ccfd1c11003210535l1505a0bfoe9f14c6ad4b07c42@mail.gmail.com> Message-ID: <4BA61C40.8060608@gmx.de> > > On Sun, Mar 21, 2010 at 03:24, Ulf Zibis wrote: > >> Am 21.03.2010 09:05, schrieb Martin Buchholz: >> >>> On Sat, Mar 20, 2010 at 15:50, Ulf Zibis wrote: >>> >> I think, we should not define a distinct method for this once-used 3-liner: >> for (; i< max-1; i++) >> if (v[i] == high&& v[i+1] == low) >> return i - offset; >> >> HotSpots resources should not be over-stressed to inline such things, having >> more reserves for more important things. >> > On the contrary - > normally the above code snippet will rarely be executed, > and so will normally not be inlined into the caller, > which makes it easier for hotspot to inline > the caller into its caller. Separate cold code into > separate methods. > Thanks, I got the idea. But Isn't the push-call-pop-return overhead comparable with those 3 lines here, not to forget the repeated cache-3-values-once-more? -Ulf > BTW, in case you try to benchmark this, > hotspot intrinsifies indexOf by default. > > Martin > > > From Ulf.Zibis at gmx.de Sun Mar 21 13:35:33 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 14:35:33 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA61C40.8060608@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA546C1.5040201@gmx.de> <4BA55149.5070603@gmx.de> <1ccfd1c11003210105s12220ffcrf139c2c59d33f84d@mail.gmail.com> <4BA5F3F1.8080609@gmx.de> <1ccfd1c11003210535l1505a0bfoe9f14c6ad4b07c42@mail.gmail.com> <4BA61C40.8060608@gmx.de> Message-ID: <4BA620A5.5020608@gmx.de> > >> >> On Sun, Mar 21, 2010 at 03:24, Ulf Zibis wrote: >>> Am 21.03.2010 09:05, schrieb Martin Buchholz: >>>> On Sat, Mar 20, 2010 at 15:50, Ulf Zibis wrote: >>> I think, we should not define a distinct method for this once-used >>> 3-liner: >>> for (; i< max-1; i++) >>> if (v[i] == high&& v[i+1] == low) >>> return i - offset; >>> >>> HotSpots resources should not be over-stressed to inline such >>> things, having >>> more reserves for more important things. >> On the contrary - >> normally the above code snippet will rarely be executed, >> and so will normally not be inlined into the caller, >> which makes it easier for hotspot to inline >> the caller into its caller. Separate cold code into >> separate methods. > > Thanks, I got the idea. > > But Isn't the push-call-pop-return overhead comparable with those 3 > lines here, not to forget the repeated cache-3-values-once-more? And additionally the slow rarely used branch would stay in stone, even if after some time, the inline threshhold becomes reached, as JIT, AFAIK, can't count the frequency of compiled code usage. -Ulf From Ulf.Zibis at gmx.de Sun Mar 21 15:42:26 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sun, 21 Mar 2010 16:42:26 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030000l6adaddd7wd2084fb29a6cda83@mail.gmail.com> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> Message-ID: <4BA63E62.2010304@gmx.de> Am 21.03.2010 08:56, schrieb Martin Buchholz: > On Sat, Mar 20, 2010 at 14:52, Ulf Zibis wrote: > > > I now believe we should provide > Character.highSurrogate and Character.lowSurrogate > as you have been advocating. > > If Sherman agrees, let's put a proper patch for this together. > - I too would move the charCount logic from String(int[], int, int) to class Character, at least as package private helper method. There just is another charCount method in good neighbourhood. - Additionally, may be a logic to handle invalid surrogate code points would be interesting. I've attached the newest version of my patch, which you can compare with your current state, ignoring some style differences etc. -Ulf -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Character_charCount URL: From martinrb at google.com Sun Mar 21 16:16:35 2010 From: martinrb at google.com (Martin Buchholz) Date: Sun, 21 Mar 2010 09:16:35 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA60A40.9050600@gmx.de> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> <4BA60A40.9050600@gmx.de> Message-ID: <1ccfd1c11003210916rad35d31wb0501f9bf960b07@mail.gmail.com> On Sun, Mar 21, 2010 at 05:00, Ulf Zibis wrote: >> >> On Sat, Mar 20, 2010 at 14:52, Ulf Zibis ?wrote: >> >>> >>> - A little "bug" in javadoc: >>> ?@exception ArrayIndexOutOfBoundsException >>> ?instead ? ?IndexOutOfBoundsException >>> >> >> Not a bug. >> > > Yes, but decreases the users capabilities catching exceptions more precise > and flexible. There is a debate about whether to reuse existing exception classes or to throw class-specific subclasses. IMO, IOOBE is a sufficiently expressive exception that I might have used just that, with expressive detail messages. But that's only a consideration when designing new API or a new platform. Old API must stay unchanged, for compatibility. > Imagine, a method would throw an IndexOutOfBoundsException for some reason > and too calls Character.toChars(). The caller of such a method could > distinguish, where the exception would come from, and have separate catch > blocks. But if not documented ... :-( > > In extreme, following too would not be a bug in your sense: > ?@exception Exception > > I became sensitive on this, as I have seen real bugs in > AbstractStringBuilder vice versa, where methods actually throw > IndexOutOfBoundsExceptions, but their javadoc states StringIndexOutOf > BoundsException. Now that's a real bug. Martin From martinrb at google.com Sun Mar 21 16:23:28 2010 From: martinrb at google.com (Martin Buchholz) Date: Sun, 21 Mar 2010 09:23:28 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA602F9.7000408@gmx.de> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA564BB.9090901@gmx.de> <1ccfd1c11003210020l1c200f89h95541d120fdc08cb@mail.gmail.com> <4BA602F9.7000408@gmx.de> Message-ID: <1ccfd1c11003210923q27e4d8bdj913350d8bca58195@mail.gmail.com> On Sun, Mar 21, 2010 at 04:28, Ulf Zibis wrote: > On Sat, Mar 20, 2010 at 17:13, Ulf Zibis wrote: > I don't think it's a performance problem in the real world. > > > Hm, if someone uses: > ???? if (Character.isBMPCodePoint(codePoint)) > ???????? ...; > ???? else if (Character.isSupplementaryCodePoint(codePoint)) // instead > isValidCodepoint() > ???????? ...; > ???? else > ???????? ...; > he will loose up to 50 % performance as you can see on my benchmark on > isSuppCPAlaMartin(). Only if their data is full of supplementary characters. > We don't usually put such performance information in the javadoc. > > > In class StringBuilder: > "Where possible, it is recommended that this class be used in preference to > StringBuffer as it will be faster under most implementations." > > java.util.List: > Note that these operations may execute in time proportional to the index > value for some implementations (the LinkedList class, for example). > > ByteBuffer#get(byte[],int,int): > In other words, an invocation of this method of the form > src.get(dst,?off,?len) has exactly the same effect as the loop > > for (int i = off; i < off + len; i++) > dst[i] = src.get(); > > except that it first checks that there are sufficient bytes in this buffer > and it is potentially much more efficient. In the above, the performance is a Raison d'?tre of the API, that real users should consider when choosing API. > Anyway, even if isSupplementaryCodePoint() is used isolated, my code will > help JIT to use 2-byte shifted adressing and shorter 2-byte immediate value > for the compare, but yes, JIT should be able to catch that without this > help. But for that case, we could stay on the old implementations too for > isBMPCodePoint and is ValidCodePoint. Again, performance with BMP characters is infinitely more important than performance with supplementary characters. Martin From martinrb at google.com Sun Mar 21 17:38:04 2010 From: martinrb at google.com (Martin Buchholz) Date: Sun, 21 Mar 2010 10:38:04 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA620A5.5020608@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA546C1.5040201@gmx.de> <4BA55149.5070603@gmx.de> <1ccfd1c11003210105s12220ffcrf139c2c59d33f84d@mail.gmail.com> <4BA5F3F1.8080609@gmx.de> <1ccfd1c11003210535l1505a0bfoe9f14c6ad4b07c42@mail.gmail.com> <4BA61C40.8060608@gmx.de> <4BA620A5.5020608@gmx.de> Message-ID: <1ccfd1c11003211038p5681441cqf7ba500b9f8079e5@mail.gmail.com> On Sun, Mar 21, 2010 at 06:35, Ulf Zibis wrote: >> >>> >>> On Sun, Mar 21, 2010 at 03:24, Ulf Zibis ?wrote: >>>> >>>> Am 21.03.2010 09:05, schrieb Martin Buchholz: >>>>> >>>>> On Sat, Mar 20, 2010 at 15:50, Ulf Zibis ? ?wrote: >>>> >>>> I think, we should not define a distinct method for this once-used >>>> 3-liner: >>>> ? ? ? ? ? ? for (; i< ?max-1; i++) >>>> ? ? ? ? ? ? ? ? if (v[i] == high&& ?v[i+1] == low) >>>> ? ? ? ? ? ? ? ? ? ? ? ? return i - offset; >>>> >>>> HotSpots resources should not be over-stressed to inline such things, >>>> having >>>> more reserves for more important things. >>> >>> On the contrary - >>> normally the above code snippet will rarely be executed, >>> and so will normally not be inlined into the caller, >>> which makes it easier for hotspot to inline >>> the caller into its caller. ?Separate cold code into >>> separate methods. >> >> Thanks, I got the idea. >> >> But Isn't the push-call-pop-return overhead comparable with those 3 lines >> here, not to forget the repeated cache-3-values-once-more? Even if I'm wrong, and this cold code is actually hot, I don't think there will be a big performance loss. The method call is outside the loop. > And additionally the slow rarely used branch would stay in stone, even if > after some time, the inline threshhold becomes reached, as JIT, AFAIK, can't > count the frequency of compiled code usage. It's certainly the intent that we will have multiple levels of compilation ("tiered compilation") and profiling would be enabled on at least some compiled code. Martin From martinrb at google.com Sun Mar 21 19:39:17 2010 From: martinrb at google.com (Martin Buchholz) Date: Sun, 21 Mar 2010 12:39:17 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> Message-ID: <1ccfd1c11003211239h2105e5f1m903dd5d3fbf5387b@mail.gmail.com> On Sun, Mar 21, 2010 at 00:56, Martin Buchholz wrote: > Study also: > http://code.google.com/p/google-collections/source/browse/trunk/src/com/google/common/base/Preconditions.java Sorry, the best (most recent) version of Preconditions to study is here: http://code.google.com/p/guava-libraries/source/browse/trunk/src/com/google/common/base/Preconditions.java especially this comment: /* * All recent hotspots (as of 2009) *really* like to have the natural code * * if (guardExpression) { * throw new BadException(messageExpression); * } * * refactored so that messageExpression is moved to a separate * String-returning method. * * if (guardExpression) { * throw new BadException(badMsg(...)); * } * * The alternative natural refactorings into void or Exception-returning * methods are much slower. This is a big deal - we're talking factors of * 2-8 in microbenchmarks, not just 10-20%. (This is a hotspot optimizer * bug, which should be fixed, but that's a separate, big project). * * The coding pattern above is heavily used in java.util, e.g. in ArrayList. * There is a RangeCheckMicroBenchmark in the JDK that was used to test this. * * But the methods in this class want to throw different exceptions, * depending on the args, so it appears that this pattern is not directly * applicable. But we can use the ridiculous, devious trick of throwing an * exception in the middle of the construction of another exception. * Hotspot is fine with that. */ Martin From christopher.hegarty at sun.com Mon Mar 22 12:00:30 2010 From: christopher.hegarty at sun.com (christopher.hegarty at sun.com) Date: Mon, 22 Mar 2010 12:00:30 +0000 Subject: hg: jdk7/tl/jdk: 6632169: HttpClient and HttpsClient should not try to reverse lookup IP address of a proxy server Message-ID: <20100322120310.8BCBE4465E@hg.openjdk.java.net> Changeset: c40572afb29e Author: chegar Date: 2010-03-22 11:55 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/c40572afb29e 6632169: HttpClient and HttpsClient should not try to reverse lookup IP address of a proxy server Reviewed-by: michaelm ! src/share/classes/sun/net/www/protocol/https/HttpsClient.java From Ulf.Zibis at gmx.de Mon Mar 22 13:57:46 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Mon, 22 Mar 2010 14:57:46 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003210923q27e4d8bdj913350d8bca58195@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA564BB.9090901@gmx.de> <1ccfd1c11003210020l1c200f89h95541d120fdc08cb@mail.gmail.com> <4BA602F9.7000408@gmx.de> <1ccfd1c11003210923q27e4d8bdj913350d8bca58195@mail.gmail.com> Message-ID: <4BA7775A.2080506@gmx.de> Am 21.03.2010 17:23, schrieb Martin Buchholz: > On Sun, Mar 21, 2010 at 04:28, Ulf Zibis wrote: > >> On Sat, Mar 20, 2010 at 17:13, Ulf Zibis wrote: >> > >> I don't think it's a performance problem in the real world. >> >> >> Hm, if someone uses: >> if (Character.isBMPCodePoint(codePoint)) >> ...; >> else if (Character.isSupplementaryCodePoint(codePoint)) // instead >> isValidCodepoint() >> ...; >> else >> ...; >> he will loose up to 50 % performance as you can see on my benchmark on >> isSuppCPAlaMartin(). >> > Only if their data is full of supplementary characters. > Yes, but we dont't know anything about the purpose of code written there in the world, so why not provide best performance or at least give a hint in the docs, if it doesn't cost anything. > >> We don't usually put such performance information in the javadoc. >> >> >> In class StringBuilder: >> "Where possible, it is recommended that this class be used in preference to >> StringBuffer as it will be faster under most implementations." >> >> java.util.List: >> Note that these operations may execute in time proportional to the index >> value for some implementations (the LinkedList class, for example). >> >> ByteBuffer#get(byte[],int,int): >> In other words, an invocation of this method of the form >> src.get(dst, off, len) has exactly the same effect as the loop >> >> for (int i = off; i< off + len; i++) >> dst[i] = src.get(); >> >> except that it first checks that there are sufficient bytes in this buffer >> and it is potentially much more efficient. >> > In the above, the performance is a Raison d'?tre of the API, > that real users should consider when choosing API. > Oh, on parle fran?ais. Je l'aime beaucoup. > >> Anyway, even if isSupplementaryCodePoint() is used isolated, my code will >> help JIT to use 2-byte shifted adressing and shorter 2-byte immediate value >> for the compare, but yes, JIT should be able to catch that without this >> help. But for that case, we could stay on the old implementations too for >> isBMPCodePoint and is ValidCodePoint. >> > Again, performance with BMP characters is infinitely more important > than performance with supplementary characters. > You are right. But I can't see any reason, why the fast supplementary version would harm the BMP performance. -Ulf From forax at univ-mlv.fr Mon Mar 22 14:33:30 2010 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Mon, 22 Mar 2010 15:33:30 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA7775A.2080506@gmx.de> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA564BB.9090901@gmx.de> <1ccfd1c11003210020l1c200f89h95541d120fdc08cb@mail.gmail.com> <4BA602F9.7000408@gmx.de> <1ccfd1c11003210923q27e4d8bdj913350d8bca58195@mail.gmail.com> <4BA7775A.2080506@gmx.de> Message-ID: <4BA77FBA.9060308@univ-mlv.fr> Le 22/03/2010 14:57, Ulf Zibis a ?crit : > Am 21.03.2010 17:23, schrieb Martin Buchholz: >> On Sun, Mar 21, 2010 at 04:28, Ulf Zibis wrote: >>> On Sat, Mar 20, 2010 at 17:13, Ulf Zibis wrote: >>> I don't think it's a performance problem in the real world. >>> >>> >>> Hm, if someone uses: >>> if (Character.isBMPCodePoint(codePoint)) >>> ...; >>> else if (Character.isSupplementaryCodePoint(codePoint)) // >>> instead >>> isValidCodepoint() >>> ...; >>> else >>> ...; >>> he will loose up to 50 % performance as you can see on my benchmark on >>> isSuppCPAlaMartin(). >> Only if their data is full of supplementary characters. > > Yes, but we dont't know anything about the purpose of code written > there in the world, so why not provide best performance or at least > give a hint in the docs, if it doesn't cost anything. > >>> We don't usually put such performance information in the javadoc. >>> >>> >>> In class StringBuilder: >>> "Where possible, it is recommended that this class be used in >>> preference to >>> StringBuffer as it will be faster under most implementations." >>> >>> java.util.List: >>> Note that these operations may execute in time proportional to the >>> index >>> value for some implementations (the LinkedList class, for example). >>> >>> ByteBuffer#get(byte[],int,int): >>> In other words, an invocation of this method of the form >>> src.get(dst, off, len) has exactly the same effect as the loop >>> >>> for (int i = off; i< off + len; i++) >>> dst[i] = src.get(); >>> >>> except that it first checks that there are sufficient bytes in this >>> buffer >>> and it is potentially much more efficient. >> In the above, the performance is a Raison d'?tre of the API, >> that real users should consider when choosing API. > > Oh, on parle fran?ais. Je l'aime beaucoup. Totally off topic but You mean: "j'aime beaucoup". je l'aime beaucoup means I love him/her a lot. > >>> Anyway, even if isSupplementaryCodePoint() is used isolated, my code >>> will >>> help JIT to use 2-byte shifted adressing and shorter 2-byte >>> immediate value >>> for the compare, but yes, JIT should be able to catch that without this >>> help. But for that case, we could stay on the old implementations >>> too for >>> isBMPCodePoint and is ValidCodePoint. >> Again, performance with BMP characters is infinitely more important >> than performance with supplementary characters. > > You are right. But I can't see any reason, why the fast supplementary > version would harm the BMP performance. > > -Ulf > R?mi From Ulf.Zibis at gmx.de Mon Mar 22 14:34:57 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Mon, 22 Mar 2010 15:34:57 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003210916rad35d31wb0501f9bf960b07@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> <4BA60A40.9050600@gmx.de> <1ccfd1c11003210916rad35d31wb0501f9bf960b07@mail.gmail.com> Message-ID: <4BA78011.2070504@gmx.de> Am 21.03.2010 17:16, schrieb Martin Buchholz: > On Sun, Mar 21, 2010 at 05:00, Ulf Zibis wrote: > >>> On Sat, Mar 20, 2010 at 14:52, Ulf Zibis wrote: >>> >>> >>>> - A little "bug" in javadoc: >>>> @exception ArrayIndexOutOfBoundsException >>>> instead IndexOutOfBoundsException >>>> >>>> >>> Not a bug. >>> >>> >> Yes, but decreases the users capabilities catching exceptions more precise >> and flexible. >> > There is a debate about whether to reuse existing exception classes > or to throw class-specific subclasses. IMO, IOOBE is a sufficiently expressive > exception that I might have used just that, with expressive detail messages. > I'm with you. Especially StringIndexOutOfBoundsException appears as superfluous sugar to me. But we have it in the docs, so there is no way to get rid of it. What do you think about to refactor most IOOBEs in String related classes to SIOOBEs? It would stay compatible to old Software, which still catches IOOBEs, but would look more straight, tidy and clean and fix the below mentioned bug. -Ulf > But that's only a consideration when designing new API or a new platform. > Old API must stay unchanged, for compatibility. > > >> Imagine, a method would throw an IndexOutOfBoundsException for some reason >> and too calls Character.toChars(). The caller of such a method could >> distinguish, where the exception would come from, and have separate catch >> blocks. But if not documented ... :-( >> >> In extreme, following too would not be a bug in your sense: >> @exception Exception >> >> I became sensitive on this, as I have seen real bugs in >> AbstractStringBuilder vice versa, where methods actually throw >> IndexOutOfBoundsExceptions, but their javadoc states StringIndexOutOf >> BoundsException. >> > Now that's a real bug. > > Martin > > > From Ulf.Zibis at gmx.de Mon Mar 22 14:45:32 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Mon, 22 Mar 2010 15:45:32 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <1ccfd1c11003211038p5681441cqf7ba500b9f8079e5@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA546C1.5040201@gmx.de> <4BA55149.5070603@gmx.de> <1ccfd1c11003210105s12220ffcrf139c2c59d33f84d@mail.gmail.com> <4BA5F3F1.8080609@gmx.de> <1ccfd1c11003210535l1505a0bfoe9f14c6ad4b07c42@mail.gmail.com> <4BA61C40.8060608@gmx.de> <4BA620A5.5020608@gmx.de> <1ccfd1c11003211038p5681441cqf7ba500b9f8079e5@mail.gmail.com> Message-ID: <4BA7828C.6060109@gmx.de> Am 21.03.2010 18:38, schrieb Martin Buchholz: > On Sun, Mar 21, 2010 at 06:35, Ulf Zibis wrote: > >>> >>>> On Sun, Mar 21, 2010 at 03:24, Ulf Zibis wrote: >>>> >>>>> Am 21.03.2010 09:05, schrieb Martin Buchholz: >>>>> >>>>>> On Sat, Mar 20, 2010 at 15:50, Ulf Zibis wrote: >>>>>> >>>>> I think, we should not define a distinct method for this once-used >>>>> 3-liner: >>>>> for (; i< max-1; i++) >>>>> if (v[i] == high&& v[i+1] == low) >>>>> return i - offset; >>>>> >>>>> HotSpots resources should not be over-stressed to inline such things, >>>>> having >>>>> more reserves for more important things. >>>>> >>>> On the contrary - >>>> normally the above code snippet will rarely be executed, >>>> and so will normally not be inlined into the caller, >>>> which makes it easier for hotspot to inline >>>> the caller into its caller. Separate cold code into >>>> separate methods. >>>> >>> Thanks, I got the idea. >>> >>> But Isn't the push-call-pop-return overhead comparable with those 3 lines >>> here, not to forget the repeated cache-3-values-once-more? >>> > Even if I'm wrong, and this cold code is actually hot, > I don't think there will be a big performance loss. > The method call is outside the loop. > What about at least reusing the cached values from calling method via indexOfSupplementary(ch, fromIndex, value, offset, max - 1) ? > >> And additionally the slow rarely used branch would stay in stone, even if >> after some time, the inline threshhold becomes reached, as JIT, AFAIK, can't >> count the frequency of compiled code usage. >> > It's certainly the intent that we will have multiple levels of > compilation ("tiered compilation") and profiling would be > enabled on at least some compiled code. > That's an interesting option. -Ulf From Ulf.Zibis at gmx.de Mon Mar 22 14:53:57 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Mon, 22 Mar 2010 15:53:57 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA77FBA.9060308@univ-mlv.fr> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA564BB.9090901@gmx.de> <1ccfd1c11003210020l1c200f89h95541d120fdc08cb@mail.gmail.com> <4BA602F9.7000408@gmx.de> <1ccfd1c11003210923q27e4d8bdj913350d8bca58195@mail.gmail.com> <4BA7775A.2080506@gmx.de> <4BA77FBA.9060308@univ-mlv.fr> Message-ID: <4BA78485.2020505@gmx.de> Am 22.03.2010 15:33, schrieb R?mi Forax: > Le 22/03/2010 14:57, Ulf Zibis a ?crit : >> >> Oh, on parle fran?ais. Je l'aime beaucoup. > > Totally off topic but You're free to add some opinion to the main topic. ;-) > You mean: "j'aime beaucoup". > je l'aime beaucoup means I love him/her a lot. ... ou "le Fran?ais" ? Well, I'm afraid I can't compete with a vrais Fran?ais. -Ulf From Ulf.Zibis at gmx.de Mon Mar 22 15:29:44 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Mon, 22 Mar 2010 16:29:44 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003211239h2105e5f1m903dd5d3fbf5387b@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> <1ccfd1c11003211239h2105e5f1m903dd5d3fbf5387b@mail.gmail.com> Message-ID: <4BA78CE8.9020107@gmx.de> Am 21.03.2010 20:39, schrieb Martin Buchholz: > On Sun, Mar 21, 2010 at 00:56, Martin Buchholz wrote: > > >> Study also: >> http://code.google.com/p/google-collections/source/browse/trunk/src/com/google/common/base/Preconditions.java >> > Sorry, the best (most recent) version of Preconditions to study is here: > > http://code.google.com/p/guava-libraries/source/browse/trunk/src/com/google/common/base/Preconditions.java > > especially this comment: > Thanks for the update. I'm not sure if I understand right the below comment. Does it mean, that inlining the message from a constant is less fast than from a call on badMsg()? -Ulf > /* > * All recent hotspots (as of 2009) *really* like to have the natural code > * > * if (guardExpression) { > * throw new BadException(messageExpression); > * } > * > * refactored so that messageExpression is moved to a separate > * String-returning method. > * > * if (guardExpression) { > * throw new BadException(badMsg(...)); > * } > * > * The alternative natural refactorings into void or Exception-returning > * methods are much slower. This is a big deal - we're talking factors of > * 2-8 in microbenchmarks, not just 10-20%. (This is a hotspot optimizer > * bug, which should be fixed, but that's a separate, big project). > * > * The coding pattern above is heavily used in java.util, e.g. in ArrayList. > * There is a RangeCheckMicroBenchmark in the JDK that was used to test this. > * > * But the methods in this class want to throw different exceptions, > * depending on the args, so it appears that this pattern is not directly > * applicable. But we can use the ridiculous, devious trick of throwing an > * exception in the middle of the construction of another exception. > * Hotspot is fine with that. > */ > > > Martin > > > From Xueming.Shen at Sun.COM Tue Mar 23 20:32:42 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Tue, 23 Mar 2010 12:32:42 -0800 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> Message-ID: <4BA9256A.2020602@sun.com> 6937112: String.lastIndexOf confused by unpaired trailing surrogate Kinda guess that it might bring us some performance benefit to separate the supplementary handling code out into its own method (to help the not that smart hotspot:-)?), but doubt it is really something worth doing. At least you dont have to have the redundant value/offset=this.value/offset. Seems like you started to attach the "final" keyword to all "constants"...guess it's a hint to help smart vm for further optimization. Is the hotspot doing something special in simple case like below? -Sherman Martin Buchholz wrote: > For a change, here's an actual plain old "incorrect result" bug fix > for String.lastIndexOf > > Sherman, please file a bug and review. > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/lastIndexOf/ > > Also includes our usual performance-oriented fiddling. > > public class LastIndexOf { > public static void main(String[] args) { > int ch = 0x10042; > char[] bug = new char[3]; > Character.toChars(ch, bug, 0); > bug[2] = bug[0]; > System.out.println(new String(bug).lastIndexOf(ch)); > bug[2] = '!'; > System.out.println(new String(bug).lastIndexOf(ch)); > } > } > ==> javac -source 1.6 -Xlint:all LastIndexOf.java > ==> java -esa -ea LastIndexOf > -1 > 0 > From Xueming.Shen at Sun.COM Tue Mar 23 20:37:07 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Tue, 23 Mar 2010 12:37:07 -0800 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA9256A.2020602@sun.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> Message-ID: <4BA92673.3030200@sun.com> CCed Masayoshi. Masayoshi, Martin and Ulf are doing some "small" overhaul on those supplementary methods guess you might be interested to review the change. Martin, Ulf, please CC Masayoshi if you are touching the supplementary handling code. -Sherman Xueming Shen wrote: > 6937112: String.lastIndexOf confused by unpaired trailing surrogate > > Kinda guess that it might bring us some performance benefit to > separate the supplementary handling > code out into its own method (to help the not that smart hotspot:-)?), > but doubt it is really something > worth doing. At least you dont have to have the redundant > value/offset=this.value/offset. > > Seems like you started to attach the "final" keyword to all > "constants"...guess it's a hint to help smart > vm for further optimization. Is the hotspot doing something special in > simple case like below? > > -Sherman > > Martin Buchholz wrote: >> For a change, here's an actual plain old "incorrect result" bug fix >> for String.lastIndexOf >> >> Sherman, please file a bug and review. >> >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/lastIndexOf/ >> >> Also includes our usual performance-oriented fiddling. >> >> public class LastIndexOf { >> public static void main(String[] args) { >> int ch = 0x10042; >> char[] bug = new char[3]; >> Character.toChars(ch, bug, 0); >> bug[2] = bug[0]; >> System.out.println(new String(bug).lastIndexOf(ch)); >> bug[2] = '!'; >> System.out.println(new String(bug).lastIndexOf(ch)); >> } >> } >> ==> javac -source 1.6 -Xlint:all LastIndexOf.java >> ==> java -esa -ea LastIndexOf >> -1 >> 0 >> > > From martinrb at google.com Mon Mar 22 20:05:19 2010 From: martinrb at google.com (Martin Buchholz) Date: Mon, 22 Mar 2010 13:05:19 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA9256A.2020602@sun.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> Message-ID: <1ccfd1c11003221305j3bc6a0eahb11ad3c5ce544732@mail.gmail.com> On Tue, Mar 23, 2010 at 13:32, Xueming Shen wrote: > 6937112: String.lastIndexOf confused by unpaired trailing surrogate > > Kinda guess that it might bring us some performance benefit to separate the > supplementary handling > code out into its own method (to help the not that smart hotspot:-)?), but > doubt it is really something > worth doing. At ?least you dont have to have the redundant > value/offset=this.value/offset. Yes, this is an "extreme" optimization, but one that is used pervasively in java.util.concurrent (Doug Lea's influence) and suitable for performance-critical methods. It's only downside is the increase in size of source code. (bytecode is actually smaller) > Seems like you started to attach the "final" keyword to all > "constants"...guess it's a hint to help smart > vm for further optimization. Is the hotspot doing something special in > simple case like below? The "final" is there purely for software engineering reasons, so that people don't make the common mistake of modifying a field cached in a local, which would have no effect. Martin > -Sherman > > Martin Buchholz wrote: >> >> For a change, here's an actual plain old "incorrect result" bug fix >> for String.lastIndexOf >> >> Sherman, please file a bug and review. >> >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/lastIndexOf/ >> >> Also includes our usual performance-oriented fiddling. >> >> public class LastIndexOf { >> ? ?public static void main(String[] args) { >> ? ? ? ?int ch = 0x10042; >> ? ? ? ?char[] bug = new char[3]; >> ? ? ? ?Character.toChars(ch, bug, 0); >> ? ? ? ?bug[2] = bug[0]; >> ? ? ? ?System.out.println(new String(bug).lastIndexOf(ch)); >> ? ? ? ?bug[2] = '!'; >> ? ? ? ?System.out.println(new String(bug).lastIndexOf(ch)); >> ? ?} >> } >> ==> javac -source 1.6 -Xlint:all LastIndexOf.java >> ==> java -esa -ea LastIndexOf >> -1 >> 0 >> > > From martinrb at google.com Mon Mar 22 20:08:31 2010 From: martinrb at google.com (Martin Buchholz) Date: Mon, 22 Mar 2010 13:08:31 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA7828C.6060109@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA546C1.5040201@gmx.de> <4BA55149.5070603@gmx.de> <1ccfd1c11003210105s12220ffcrf139c2c59d33f84d@mail.gmail.com> <4BA5F3F1.8080609@gmx.de> <1ccfd1c11003210535l1505a0bfoe9f14c6ad4b07c42@mail.gmail.com> <4BA61C40.8060608@gmx.de> <4BA620A5.5020608@gmx.de> <1ccfd1c11003211038p5681441cqf7ba500b9f8079e5@mail.gmail.com> <4BA7828C.6060109@gmx.de> Message-ID: <1ccfd1c11003221308q1644bbb5w38141fe55a66ba8e@mail.gmail.com> On Mon, Mar 22, 2010 at 07:45, Ulf Zibis wrote: > Am 21.03.2010 18:38, schrieb Martin Buchholz: >> Even if I'm wrong, and this cold code is actually hot, >> I don't think there will be a big performance loss. >> The method call is outside the loop. >> > > What about at least reusing the cached values from calling method via > indexOfSupplementary(ch, fromIndex, value, offset, max - 1) ? No. We're optimizing for BMP. Martin From martinrb at google.com Mon Mar 22 22:03:03 2010 From: martinrb at google.com (Martin Buchholz) Date: Mon, 22 Mar 2010 15:03:03 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA78011.2070504@gmx.de> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> <4BA60A40.9050600@gmx.de> <1ccfd1c11003210916rad35d31wb0501f9bf960b07@mail.gmail.com> <4BA78011.2070504@gmx.de> Message-ID: <1ccfd1c11003221503r46e6bb78g241e2b07ff7f1b3c@mail.gmail.com> On Mon, Mar 22, 2010 at 07:34, Ulf Zibis wrote: > Am 21.03.2010 17:16, schrieb Martin Buchholz: >> There is a debate about whether to reuse existing exception classes >> or to throw class-specific subclasses. ?IMO, IOOBE is a sufficiently >> expressive >> exception that I might have used just that, with expressive detail >> messages. >> > > I'm with you. Especially StringIndexOutOfBoundsException appears as > superfluous sugar to me. But we have it in the docs, so there is no way to > get rid of it. > What do you think about to refactor most IOOBEs in String related classes to > SIOOBEs? It would stay compatible to old Software, which still catches > IOOBEs, but would look more straight, tidy and clean and fix the below > mentioned bug. Every change is an incompatible change, with a risk/benefit tradeoff. IMO there is no change to the exceptions thrown, or declared to be thrown, or to their detail messages, in the string classes that is worth the risk of incompatible change. (with the exception of when the implementation contradicts the spec, which is worth fixing) Martin From martinrb at google.com Mon Mar 22 22:08:08 2010 From: martinrb at google.com (Martin Buchholz) Date: Mon, 22 Mar 2010 15:08:08 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA78CE8.9020107@gmx.de> References: <4A95079A.8080803@gmx.de> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> <1ccfd1c11003211239h2105e5f1m903dd5d3fbf5387b@mail.gmail.com> <4BA78CE8.9020107@gmx.de> Message-ID: <1ccfd1c11003221508n6180fd1dk862d47a2f27f42e2@mail.gmail.com> On Mon, Mar 22, 2010 at 08:29, Ulf Zibis wrote: > Am 21.03.2010 20:39, schrieb Martin Buchholz: >> >> On Sun, Mar 21, 2010 at 00:56, Martin Buchholz >> ?wrote: >> >> >>> >>> Study also: >> http://code.google.com/p/guava-libraries/source/browse/trunk/src/com/google/common/base/Preconditions.java >> >> especially this comment: >> > > Thanks for the update. I'm not sure if I understand right the below comment. > Does it mean, that inlining the message from a constant is less fast than > from a call on badMsg()? I'm not sure I understand exactly, but as the comment says, always make your error-checking code look like this: * * if (guardExpression) { * throw new BadException(badMsg(...)); * } * although in String it's not so important because there's no String concatenation, which is a notable cause of cold bytecode bloat. Martin From martinrb at google.com Mon Mar 22 22:27:37 2010 From: martinrb at google.com (Martin Buchholz) Date: Mon, 22 Mar 2010 15:27:37 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BA78CE8.9020107@gmx.de> References: <4A95079A.8080803@gmx.de> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> <1ccfd1c11003211239h2105e5f1m903dd5d3fbf5387b@mail.gmail.com> <4BA78CE8.9020107@gmx.de> Message-ID: <1ccfd1c11003221527q29f61f7u700344a99d293ceb@mail.gmail.com> Ulf, I'd like to start a mq patch containing changes to the String exception handling in the string classes. Please provide me with a patch that uses the blessed conventional names from Preconditions.java. For the version that checks an offset and length for containment within a larger sequence, I would prefer the name "checkSubsequence", for example private static void checkSubsequence(int start, int len, int size) Please make sure that there are sufficient tests in test/java/lang/String to ensure that you are not inadvertently making changes to the exceptions thrown. I note that test/java/lang/String/{Exceptions,Supplementary} do try to test exception handling, but do not appear to test for the *exact* class of the exception thrown, nor the detail message of the exception. When those tests were written, compatibility was less important. Please adapt my test/java/util/ArrayList/RangeCheckMicroBenchmark.java to test string classes instead. There is a good chance that you can demonstrate a performance improvement on ordinary String operations! Thanks, Martin From martinrb at google.com Mon Mar 22 22:36:20 2010 From: martinrb at google.com (Martin Buchholz) Date: Mon, 22 Mar 2010 15:36:20 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA92673.3030200@sun.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> Message-ID: <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> Masayoshi, Ulf and I are working on a few changes to supplementary character handling. Character.isSurrogate has already gone in. The following are in the pipeline: 6934268: Better implementation of Character.isValidCodePoint http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isValidCodePoint 6934265: Add public method Character.isBMPCodePoint http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint [mq]: isBMPCodePoint2 http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint2 6937112: String.lastIndexOf confused by unpaired trailing surrogate http://cr.openjdk.java.net/~martin/webrevs/openjdk7/lastIndexOf In addition, Ulf and I would like to add char Character.highSurrogate(int codePoint) char Character.lowSurrogate(int codePoint) Ulf, please provide me with your latest patch for Character.highSurrogate and I will add it to the pipeline. Martin On Tue, Mar 23, 2010 at 13:37, Xueming Shen wrote: > CCed Masayoshi. > > Masayoshi, Martin and Ulf are doing some "small" overhaul on those > supplementary methods guess > you might be interested to review the change. > > Martin, Ulf, please CC Masayoshi if you are touching the supplementary > handling code. > From Ulf.Zibis at gmx.de Mon Mar 22 23:02:03 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 23 Mar 2010 00:02:03 +0100 Subject: Kinda ? In-Reply-To: <4BA9256A.2020602@sun.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> Message-ID: <4BA7F6EB.6040804@gmx.de> Can somebody betray the sense of "Kinda" to me? -Ulf From Paul.Hohensee at Sun.COM Mon Mar 22 23:13:00 2010 From: Paul.Hohensee at Sun.COM (Paul Hohensee) Date: Mon, 22 Mar 2010 19:13:00 -0400 Subject: Kinda ? In-Reply-To: <4BA7F6EB.6040804@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA7F6EB.6040804@gmx.de> Message-ID: <4BA7F97C.5070202@sun.com> "in a way" plus "somewhat", as in "it's kinda bad" == "in a way, it's somewhat bad". On 3/22/10 7:02 PM, Ulf Zibis wrote: > Can somebody betray the sense of "Kinda" to me? > > -Ulf > > From Ulf.Zibis at gmx.de Mon Mar 22 23:14:21 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 23 Mar 2010 00:14:21 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <1ccfd1c11003221308q1644bbb5w38141fe55a66ba8e@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA546C1.5040201@gmx.de> <4BA55149.5070603@gmx.de> <1ccfd1c11003210105s12220ffcrf139c2c59d33f84d@mail.gmail.com> <4BA5F3F1.8080609@gmx.de> <1ccfd1c11003210535l1505a0bfoe9f14c6ad4b07c42@mail.gmail.com> <4BA61C40.8060608@gmx.de> <4BA620A5.5020608@gmx.de> <1ccfd1c11003211038p5681441cqf7ba500b9f8079e5@mail.gmail.com> <4BA7828C.6060109@gmx.de> <1ccfd1c11003221308q1644bbb5w38141fe55a66ba8e@mail.gmail.com> Message-ID: <4BA7F9CD.7090601@gmx.de> Am 22.03.2010 21:08, schrieb Martin Buchholz: > On Mon, Mar 22, 2010 at 07:45, Ulf Zibis wrote: > >> Am 21.03.2010 18:38, schrieb Martin Buchholz: >> > >>> Even if I'm wrong, and this cold code is actually hot, >>> I don't think there will be a big performance loss. >>> The method call is outside the loop. >>> >>> >> What about at least reusing the cached values from calling method via >> indexOfSupplementary(ch, fromIndex, value, offset, max - 1) ? >> > No. We're optimizing for BMP. > There would be no harm on BMP case speed. HotSpot wouldn't copy the values to stack if (1) ch is a BMP character and (2) indexOfSupplementary() becomes inlined. I think Sherman is right in "dont have to have the redundant value/offset=this.value/offset". -Ulf From Ulf.Zibis at gmx.de Mon Mar 22 23:17:42 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 23 Mar 2010 00:17:42 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA9256A.2020602@sun.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> Message-ID: <4BA7FA96.7070102@gmx.de> Sherman, can you have a look on your PC clock. I guess it's dis-adjusted. -Ulf From Xueming.Shen at Sun.COM Mon Mar 22 23:39:20 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Mon, 22 Mar 2010 16:39:20 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA7FA96.7070102@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA7FA96.7070102@gmx.de> Message-ID: <4BA7FFA8.7000302@sun.com> Thanks Ulf! Fixed:-) Ulf Zibis wrote: > Sherman, can you have a look on your PC clock. I guess it's dis-adjusted. > > -Ulf > > From weijun.wang at sun.com Tue Mar 23 02:42:26 2010 From: weijun.wang at sun.com (weijun.wang at sun.com) Date: Tue, 23 Mar 2010 02:42:26 +0000 Subject: hg: jdk7/tl/jdk: 6586707: NTLM authentication with proxy fails Message-ID: <20100323024302.3CD6A4472E@hg.openjdk.java.net> Changeset: 31dcf23042f9 Author: weijun Date: 2010-03-23 10:41 +0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/31dcf23042f9 6586707: NTLM authentication with proxy fails Reviewed-by: chegar ! src/share/classes/sun/net/www/protocol/http/HttpURLConnection.java From Ulf.Zibis at gmx.de Tue Mar 23 11:34:51 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 23 Mar 2010 12:34:51 +0100 Subject: Kinda ? In-Reply-To: <4BA85AFA.70005@paradise.net.nz> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA7F6EB.6040804@gmx.de> <4BA7F97C.5070202@sun.com> <4BA85AFA.70005@paradise.net.nz> Message-ID: <4BA8A75B.80904@gmx.de> Much thanks for your kind answers. I missed it on my beloved LEO . -Ulf Am 23.03.2010 07:08, schrieb Bruce Chapman & Barbara Carey: > Paul Hohensee wrote: >> "in a way" plus "somewhat", as in "it's kinda bad" == "in a way, it's >> somewhat bad". >> >> On 3/22/10 7:02 PM, Ulf Zibis wrote: >>> Can somebody betray the sense of "Kinda" to me? >>> >>> -Ulf >>> >>> >> > a spoken contraction of "kind of" (similar meaning to sorta a > contraction of sort-of) > > nothing to do with children (kinder) although you might sometimes see > it spelt that way too. > > Bruce > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ulf.Zibis at gmx.de Tue Mar 23 12:22:29 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 23 Mar 2010 13:22:29 +0100 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8E3DA3.7090902@gmx.de> <1ccfd1c11003030806h45c16691p97961cb1003eba55@mail.gmail.com> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> Message-ID: <4BA8B285.1040403@gmx.de> Am 13.03.2010 00:04, schrieb Martin Buchholz: > >> Remembers me that some months ago I prepared a beautified version of >> Character's source (things like above, replacing against {@code}, >> indentation inconsistencies etc.) Would there be interest to provide such a >> patch ? >> > Please provide URL of patch. > > All this work I had done here: https://bugs.openjdk.java.net/show_bug.cgi?id=100104 I suggest to start with patch "Cosmetics 1", and then go further. Unfortunately the patches don't contain our latest bit twiddling, but I think, "Cosmetics 1" / "2" could be done first, and after we could include the bit twiddling. Ulf From Ulf.Zibis at gmx.de Tue Mar 23 12:58:01 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 23 Mar 2010 13:58:01 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA7F9CD.7090601@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA546C1.5040201@gmx.de> <4BA55149.5070603@gmx.de> <1ccfd1c11003210105s12220ffcrf139c2c59d33f84d@mail.gmail.com> <4BA5F3F1.8080609@gmx.de> <1ccfd1c11003210535l1505a0bfoe9f14c6ad4b07c42@mail.gmail.com> <4BA61C40.8060608@gmx.de> <4BA620A5.5020608@gmx.de> <1ccfd1c11003211038p5681441cqf7ba500b9f8079e5@mail.gmail.com> <4BA7828C.6060109@gmx.de> <1ccfd1c11003221308q1644bbb5w38141fe55a66ba8e@mail.gmail.com> <4BA7F9CD.7090601@gmx.de> Message-ID: <4BA8BAD9.1000809@gmx.de> Am 23.03.2010 00:14, schrieb Ulf Zibis: > Am 22.03.2010 21:08, schrieb Martin Buchholz: >> On Mon, Mar 22, 2010 at 07:45, Ulf Zibis wrote: >>> Am 21.03.2010 18:38, schrieb Martin Buchholz: >>>> Even if I'm wrong, and this cold code is actually hot, >>>> I don't think there will be a big performance loss. >>>> The method call is outside the loop. >>>> >>> What about at least reusing the cached values from calling method via >>> indexOfSupplementary(ch, fromIndex, value, offset, max - 1) ? >> No. We're optimizing for BMP. > > There would be no harm on BMP case speed. HotSpot wouldn't copy the > values to stack if (1) ch is a BMP character and (2) > indexOfSupplementary() becomes inlined. > I think Sherman is right in "dont have to have the redundant > value/offset=this.value/offset". Additionally if indexOfSupplementary() would be static, transfer of this pointer would be saved. -Ulf From christopher.hegarty at sun.com Tue Mar 23 13:57:59 2010 From: christopher.hegarty at sun.com (christopher.hegarty at sun.com) Date: Tue, 23 Mar 2010 13:57:59 +0000 Subject: hg: jdk7/tl/jdk: 6614957: HttpsURLConnection not using the set SSLSocketFactory for creating all its Sockets; ... Message-ID: <20100323135911.52071447CD@hg.openjdk.java.net> Changeset: 8a9ebdc27045 Author: chegar Date: 2010-03-23 13:54 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/8a9ebdc27045 6614957: HttpsURLConnection not using the set SSLSocketFactory for creating all its Sockets 6771432: createSocket() - smpatch fails using 1.6.0_10 because of "Unconnected sockets not implemented" 6766775: X509 certificate hostname checking is broken in JDK1.6.0_10 Summary: All three bugs are interdependent Reviewed-by: xuelei ! src/share/classes/javax/net/SocketFactory.java ! src/share/classes/sun/net/NetworkClient.java ! src/share/classes/sun/net/www/protocol/https/HttpsClient.java ! src/share/classes/sun/security/ssl/SSLSocketImpl.java + test/sun/security/ssl/sun/net/www/protocol/https/HttpsURLConnection/DNSIdentities.java + test/sun/security/ssl/sun/net/www/protocol/https/HttpsURLConnection/HttpsCreateSockTest.java + test/sun/security/ssl/sun/net/www/protocol/https/HttpsURLConnection/HttpsSocketFacTest.java + test/sun/security/ssl/sun/net/www/protocol/https/HttpsURLConnection/IPAddressDNSIdentities.java + test/sun/security/ssl/sun/net/www/protocol/https/HttpsURLConnection/IPAddressIPIdentities.java + test/sun/security/ssl/sun/net/www/protocol/https/HttpsURLConnection/IPIdentities.java + test/sun/security/ssl/sun/net/www/protocol/https/HttpsURLConnection/Identities.java From Ulf.Zibis at gmx.de Tue Mar 23 16:11:48 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 23 Mar 2010 17:11:48 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> Message-ID: <4BA8E844.7080901@gmx.de> Am 22.03.2010 23:36, schrieb Martin Buchholz: > Masayoshi, > > Ulf and I are working on a few changes to supplementary character handling. > Character.isSurrogate has already gone in. > > The following are in the pipeline: > > 6934268: Better implementation of Character.isValidCodePoint > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isValidCodePoint > 6934265: Add public method Character.isBMPCodePoint > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint > [mq]: isBMPCodePoint2 > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint2 > 6937112: String.lastIndexOf confused by unpaired trailing surrogate > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/lastIndexOf > > In addition, Ulf and I would like to add > char Character.highSurrogate(int codePoint) > char Character.lowSurrogate(int codePoint) > > Ulf, > please provide me with your latest patch for Character.highSurrogate > and I will add it to the pipeline. Here it is. I couldn't resist from some beautifying, and purging of sun.nio.cs.Surrogate. Feel free to ignore it. -Ulf -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Character_highLowSurrogate URL: From Ulf.Zibis at gmx.de Tue Mar 23 17:59:36 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 23 Mar 2010 18:59:36 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA8E844.7080901@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> Message-ID: <4BA90188.3090902@gmx.de> Am 23.03.2010 17:11, schrieb Ulf Zibis: > Am 22.03.2010 23:36, schrieb Martin Buchholz: >> Masayoshi, >> >> Ulf and I are working on a few changes to supplementary character >> handling. >> Character.isSurrogate has already gone in. >> >> The following are in the pipeline: >> >> 6934268: Better implementation of Character.isValidCodePoint >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isValidCodePoint >> 6934265: Add public method Character.isBMPCodePoint >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint >> >> [mq]: isBMPCodePoint2 >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint2 >> 6937112: String.lastIndexOf confused by unpaired trailing surrogate >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/lastIndexOf >> >> In addition, Ulf and I would like to add >> char Character.highSurrogate(int codePoint) >> char Character.lowSurrogate(int codePoint) >> >> Ulf, >> please provide me with your latest patch for Character.highSurrogate >> and I will add it to the pipeline. > > Here it is. > > I couldn't resist from some beautifying, and purging of > sun.nio.cs.Surrogate. > Feel free to ignore it. > > -Ulf little correction -Ulf -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Character_highLowSurrogate URL: From Ulf.Zibis at gmx.de Tue Mar 23 18:17:39 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 23 Mar 2010 19:17:39 +0100 Subject: Kinda ? In-Reply-To: <4BA80022.3010907@oracle.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA7F6EB.6040804@gmx.de> <4BA80022.3010907@oracle.com> Message-ID: <4BA905C3.8080902@gmx.de> Am 23.03.2010 00:41, schrieb David Holmes: > Ulf Zibis said the following on 03/23/10 09:02: >> Can somebody betray the sense of "Kinda" to me? > > PS. You really meant "convey the sense of" not "betray". :) The typical trap from using dictionaries. Thanks. I meant it in a kinda ironical sense for to break a secret. "verraten" in German has those 2 meanings. -Ulf From martinrb at google.com Tue Mar 23 18:19:11 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 23 Mar 2010 11:19:11 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA90188.3090902@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> Message-ID: <1ccfd1c11003231119g19d9e4d2x881322e9ed6c9b27@mail.gmail.com> Ulf, Please do not delete methods in Surrogate.java (because we take compatibility seriously) but instead gently denigrate them, as I do below (added to my patch isBMPCodePoint2) diff --git a/src/share/classes/sun/nio/cs/Surrogate.java b/src/share/classes/sun/nio/cs/Surrogate.java --- a/src/share/classes/sun/nio/cs/Surrogate.java +++ b/src/share/classes/sun/nio/cs/Surrogate.java @@ -77,6 +77,7 @@ /** * Tells whether or not the given UCS-4 character must be represented as a * surrogate pair in UTF-16. + * Use of {@link Character#isSupplementaryCodePoint} is generally preferred. */ public static boolean neededFor(int uc) { return Character.isSupplementaryCodePoint(uc); @@ -102,6 +103,7 @@ /** * Converts the given surrogate pair into a 32-bit UCS-4 character. + * Use of {@link Character#toCodePoint} is generally preferred. */ public static int toUCS4(char c, char d) { assert Character.isHighSurrogate(c) && Character.isLowSurrogate(d); Martin From Ulf.Zibis at gmx.de Tue Mar 23 18:31:22 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 23 Mar 2010 19:31:22 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <1ccfd1c11003231119g19d9e4d2x881322e9ed6c9b27@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003231119g19d9e4d2x881322e9ed6c9b27@mail.gmail.com> Message-ID: <4BA908FA.9040107@gmx.de> Ok, sorry and thanks. Wouldn't "deprecated" be more noticeable? What about using this message from compiler? : warning: Surrogate is Sun proprietary API and may be removed in a future release. @deprecated Public replacement is {@link Character#isSupplementaryCodePoint} -Ulf Am 23.03.2010 19:19, schrieb Martin Buchholz: > Ulf, > > Please do not delete methods in Surrogate.java > (because we take compatibility seriously) > but instead gently denigrate them, > as I do below (added to my patch isBMPCodePoint2) > > diff --git a/src/share/classes/sun/nio/cs/Surrogate.java > b/src/share/classes/sun/nio/cs/Surrogate.java > --- a/src/share/classes/sun/nio/cs/Surrogate.java > +++ b/src/share/classes/sun/nio/cs/Surrogate.java > @@ -77,6 +77,7 @@ > /** > * Tells whether or not the given UCS-4 character must be represented as a > * surrogate pair in UTF-16. > + * Use of {@link Character#isSupplementaryCodePoint} is generally > preferred. > */ > public static boolean neededFor(int uc) { > return Character.isSupplementaryCodePoint(uc); > @@ -102,6 +103,7 @@ > > /** > * Converts the given surrogate pair into a 32-bit UCS-4 character. > + * Use of {@link Character#toCodePoint} is generally preferred. > */ > public static int toUCS4(char c, char d) { > assert Character.isHighSurrogate(c)&& Character.isLowSurrogate(d); > > > Martin > > > From martinrb at google.com Tue Mar 23 19:17:16 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 23 Mar 2010 12:17:16 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA908FA.9040107@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003231119g19d9e4d2x881322e9ed6c9b27@mail.gmail.com> <4BA908FA.9040107@gmx.de> Message-ID: <1ccfd1c11003231217p85ce189j63831ee31604f7cc@mail.gmail.com> Deprecation of Surrogate methods is a reasonable choice. Of course, users are not supposed to use Surrogate, and they are regularly pestered by javac not to. I rejected deprecation in favor of "denigration" because there is nothing actually wrong with the existing methods except that they are not standardized. In particular, we would never dream of removing them. Surrogate has the very big advantage over the new methods in Character of being compatible with prior JDK releases. Deprecation is generally reserved for APIs that are actively harmful. Martin On Tue, Mar 23, 2010 at 11:31, Ulf Zibis wrote: > Ok, sorry and thanks. > > Wouldn't "deprecated" be more noticeable? > > What about using this message from compiler? : > warning: Surrogate is Sun proprietary API and may be removed in a future > release. > @deprecated Public replacement is {@link Character#isSupplementaryCodePoint} From martinrb at google.com Tue Mar 23 23:50:20 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 23 Mar 2010 16:50:20 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA90188.3090902@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> Message-ID: <1ccfd1c11003231650g48dc0fb8gd445c46699433377@mail.gmail.com> I've added another mini-patch to my patch set. http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint3 This deletes Surrogate.java, as Ulf wants, except that ... it's another variant of Surrogate.java! (which I didn't know existed) Uses of Surrogate.neededFor are all now changed to Character.isSupplementaryCodePoint, as suggested by Ulf. I intend to fold all of the isBMPCodePoint patches together into one before I commit them. Ulf, please review. Martin From jonathan.gibbons at sun.com Wed Mar 24 01:07:03 2010 From: jonathan.gibbons at sun.com (jonathan.gibbons at sun.com) Date: Wed, 24 Mar 2010 01:07:03 +0000 Subject: hg: jdk7/tl/langtools: 6937244: sqe ws7 tools javap/javap_t10a fail jdk7 b80 used output of javap is changed Message-ID: <20100324010711.722334408A@hg.openjdk.java.net> Changeset: dd30de080cb9 Author: jjg Date: 2010-03-23 18:05 -0700 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/dd30de080cb9 6937244: sqe ws7 tools javap/javap_t10a fail jdk7 b80 used output of javap is changed Reviewed-by: darcy ! src/share/classes/com/sun/tools/javap/ClassWriter.java + test/tools/javap/6937244/T6937244.java + test/tools/javap/6937244/T6937244A.java From martinrb at google.com Tue Mar 23 22:50:09 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 23 Mar 2010 15:50:09 -0700 Subject: Concurrent calls to new Random() not random enough Message-ID: <1ccfd1c11003231550x5bc5509dy3061513adafd0e40@mail.gmail.com> Hi Sherman, This is a bug report (sorry, no fix this time) Synopsis: Concurrent calls to new Random() not random enough Description: new Random() promises this: /** * Creates a new random number generator. This constructor sets * the seed of the random number generator to a value very likely * to be distinct from any other invocation of this constructor. */ but if there are concurrent calls to new Random(), it does not do very well at fulfilling its contract. The following program should print out a number much closer to 0. import java.util.*; public class RandomSeed { public static void main(String[] args) throws Throwable { class RandomCollector implements Runnable { int[] randoms = new int[1<<21]; public void run() { for (int i = 0; i < randoms.length; i++) randoms[i] = new Random().nextInt(); }}; final int threadCount = 2; List collectors = new ArrayList(); List threads = new ArrayList(); for (int i = 0; i < threadCount; i++) { RandomCollector r = new RandomCollector(); collectors.add(r); threads.add(new Thread(r)); } for (Thread thread : threads) thread.start(); for (Thread thread : threads) thread.join(); int collisions = 0; HashSet s = new HashSet(); for (RandomCollector r : collectors) { for (int x : r.randoms) { if (s.contains(x)) collisions++; s.add(x); } } System.out.println(collisions); } } --- ==> javac -source 1.6 -Xlint:all RandomSeed.java ==> java -esa -ea RandomSeed 876 From martinrb at google.com Tue Mar 23 22:59:29 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 23 Mar 2010 15:59:29 -0700 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <4BA8B285.1040403@gmx.de> References: <4A95079A.8080803@gmx.de> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> <4BA8B285.1040403@gmx.de> Message-ID: <1ccfd1c11003231559x25aef975hca5b81e9dfe9b09c@mail.gmail.com> On Tue, Mar 23, 2010 at 05:22, Ulf Zibis wrote: > Am 13.03.2010 00:04, schrieb Martin Buchholz: >> >>> Remembers me that some months ago I prepared a beautified version of >>> Character's source (things like above, replacing ?against {@code}, >>> indentation inconsistencies etc.) Would there be interest to provide such >>> a >>> patch ? I support the plan of fixing coding style in core libraries when there is consensus amongst developers, as there is with => @code and @exception => @throws I think the right way to do this is to modify large portions of the java libraries using a script. The script should be checked into the jdk repo as part of the fix. There should be automated verification that the generated javadoc is left unchanged. There is precedent, for example the recent whitespace changes by Kelly, and my own fixes to @since in jdk6. To get you started, here is some elisp code that I have used when making such changes on a file-level: (defun tt-code () (interactive) (query-replace-regexp "<\\(tt\\|code\\)>\\([^&<>\\\\]+\\)" "{@code \\2}")) I suggest as a goal, modifying java.{lang,util,io,nio} Martin From i30817 at gmail.com Wed Mar 24 01:26:50 2010 From: i30817 at gmail.com (Paulo Levi) Date: Wed, 24 Mar 2010 01:26:50 +0000 Subject: Superpackages and final Message-ID: <212322091003231826r53533954n2f644a2551b9f916@mail.gmail.com> Do superpackages can mark exported classes as final? What do i mean is can i export a class "as final" instead of marking it final. I ask because in some situations separating a unusual object capability into a sub-type would be advantageous memory-wise, but the type has to be final, because of backward compatibility or design. I'm going to give a rather radical example from java.lang: String has a substring capability and it uses two int fields (of 3 + a char array) to do it, when it could return a private subclass of string on substring. It doesn't, besides serialization complications, i guess because string is designed not be extended for immutability concerns. Problem is final has no granularity, and is all or nothing, namespace-wise. Well, i asked the question, but i don't have much hope. -------------- next part -------------- An HTML attachment was scrubbed... URL: From i30817 at gmail.com Wed Mar 24 01:29:26 2010 From: i30817 at gmail.com (Paulo Levi) Date: Wed, 24 Mar 2010 01:29:26 +0000 Subject: Superpackages and final In-Reply-To: <212322091003231826r53533954n2f644a2551b9f916@mail.gmail.com> References: <212322091003231826r53533954n2f644a2551b9f916@mail.gmail.com> Message-ID: <212322091003231829g33a23f2bpb2ca5d2e7e6487c7@mail.gmail.com> An alternative for java++ would be a immutable keyword instead of final. Those classes would be only be able to be extended if they were also immutable. Type systems are so primitive still... -------------- next part -------------- An HTML attachment was scrubbed... URL: From neal at gafter.com Wed Mar 24 02:02:22 2010 From: neal at gafter.com (Neal Gafter) Date: Tue, 23 Mar 2010 19:02:22 -0700 Subject: Superpackages and final In-Reply-To: <212322091003231826r53533954n2f644a2551b9f916@mail.gmail.com> References: <212322091003231826r53533954n2f644a2551b9f916@mail.gmail.com> Message-ID: <15e8b9d21003231902y6b033671t812d88e2adf89b7@mail.gmail.com> You could make the constructor module-private, so that the class can only be extended within the module, and use public static factory methods to create instances. You don't need any special language support to do that. On Tue, Mar 23, 2010 at 6:26 PM, Paulo Levi wrote: > Do superpackages can mark exported classes as final? > > What do i mean is can i export a class "as final" instead of marking it > final. > I ask because in some situations separating a unusual object capability > into a sub-type would be advantageous memory-wise, but the type has to be > final, because of backward compatibility or design. > > I'm going to give a rather radical example from java.lang: String has a > substring capability and it uses two int fields (of 3 + a char array) to do > it, when it could return a private subclass of string on substring. It > doesn't, besides serialization complications, i guess because string is > designed not be extended for immutability concerns. > > Problem is final has no granularity, and is all or nothing, namespace-wise. > Well, i asked the question, but i don't have much hope. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From i30817 at gmail.com Wed Mar 24 02:12:25 2010 From: i30817 at gmail.com (Paulo Levi) Date: Wed, 24 Mar 2010 02:12:25 +0000 Subject: Superpackages and final In-Reply-To: <15e8b9d21003231902y6b033671t812d88e2adf89b7@mail.gmail.com> References: <212322091003231826r53533954n2f644a2551b9f916@mail.gmail.com> <15e8b9d21003231902y6b033671t812d88e2adf89b7@mail.gmail.com> Message-ID: <212322091003231912k42b8183dj776ae316162f7ca1@mail.gmail.com> I see. Guess i didn't thought it out. So make the constructor package private, but the type public is enough to block extension... Not very self documenting though. -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinrb at google.com Wed Mar 24 02:17:20 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 23 Mar 2010 19:17:20 -0700 Subject: Concurrent calls to new Random() not random enough In-Reply-To: <1ccfd1c11003231550x5bc5509dy3061513adafd0e40@mail.gmail.com> References: <1ccfd1c11003231550x5bc5509dy3061513adafd0e40@mail.gmail.com> Message-ID: <1ccfd1c11003231917j13b11b2dl46d3a801ecd05919@mail.gmail.com> [+fy, jeremymanson] Here's a much better test case, and a proposed fix: http://cr.openjdk.java.net/~martin/webrevs/openjdk7/RandomSeedCollisions This adds some initialization overhead, but also removes some since new Random() no longer invokes a synchronized method. ---- import java.util.*; public class RandomSeed { public static void main(String[] args) throws Throwable { class RandomCollector implements Runnable { long[] randoms = new long[1<<22]; public void run() { for (int i = 0; i < randoms.length; i++) randoms[i] = new Random().nextLong(); }}; final int threadCount = 2; List collectors = new ArrayList(); List threads = new ArrayList(); for (int i = 0; i < threadCount; i++) { RandomCollector r = new RandomCollector(); collectors.add(r); threads.add(new Thread(r)); } for (Thread thread : threads) thread.start(); for (Thread thread : threads) thread.join(); int collisions = 0; HashSet s = new HashSet(); for (RandomCollector r : collectors) { for (long x : r.randoms) { if (s.contains(x)) collisions++; s.add(x); } } System.out.printf("collisions=%d%n", collisions); } } On Tue, Mar 23, 2010 at 15:50, Martin Buchholz wrote: > Hi Sherman, > > This is a bug report (sorry, no fix this time) > > Synopsis: Concurrent calls to new Random() not random enough > Description: > new Random() promises this: > ? ?/** > ? ? * Creates a new random number generator. This constructor sets > ? ? * the seed of the random number generator to a value very likely > ? ? * to be distinct from any other invocation of this constructor. > ? ? */ > > but if there are concurrent calls to new Random(), it does not > do very well at fulfilling its contract. > > The following program should print out a number much closer to 0. > > import java.util.*; > > public class RandomSeed { > ? ?public static void main(String[] args) throws Throwable { > ? ? ? ?class RandomCollector implements Runnable { > ? ? ? ? ? ?int[] randoms = new int[1<<21]; > ? ? ? ? ? ?public void run() { > ? ? ? ? ? ? ? ?for (int i = 0; i < randoms.length; i++) > ? ? ? ? ? ? ? ? ? ?randoms[i] = new Random().nextInt(); > ? ? ? ? ? ?}}; > ? ? ? ?final int threadCount = 2; > ? ? ? ?List collectors = new ArrayList(); > ? ? ? ?List threads = new ArrayList(); > ? ? ? ?for (int i = 0; i < threadCount; i++) { > ? ? ? ? ? ?RandomCollector r = new RandomCollector(); > ? ? ? ? ? ?collectors.add(r); > ? ? ? ? ? ?threads.add(new Thread(r)); > ? ? ? ?} > ? ? ? ?for (Thread thread : threads) > ? ? ? ? ? ?thread.start(); > ? ? ? ?for (Thread thread : threads) > ? ? ? ? ? ?thread.join(); > ? ? ? ?int collisions = 0; > ? ? ? ?HashSet s = new HashSet(); > ? ? ? ?for (RandomCollector r : collectors) { > ? ? ? ? ? ?for (int x : r.randoms) { > ? ? ? ? ? ? ? ?if (s.contains(x)) > ? ? ? ? ? ? ? ? ? ?collisions++; > ? ? ? ? ? ? ? ?s.add(x); > ? ? ? ? ? ?} > ? ? ? ?} > ? ? ? ?System.out.println(collisions); > ? ?} > } > --- > ==> javac -source 1.6 -Xlint:all RandomSeed.java > ==> java -esa -ea RandomSeed > 876 > From daniel.daugherty at sun.com Wed Mar 24 03:20:33 2010 From: daniel.daugherty at sun.com (daniel.daugherty at sun.com) Date: Wed, 24 Mar 2010 03:20:33 +0000 Subject: hg: jdk7/tl/jdk: 6915365: 3/4 assert(false, "Unsupported VMGlobal Type") at management.cpp:1540 Message-ID: <20100324032046.21DBD440B1@hg.openjdk.java.net> Changeset: f8c9a5e3f5db Author: dcubed Date: 2010-03-23 19:03 -0700 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/f8c9a5e3f5db 6915365: 3/4 assert(false,"Unsupported VMGlobal Type") at management.cpp:1540 Summary: Remove exception throw to decouple JDK and HotSpot additions of known types. Reviewed-by: mchung ! src/share/native/sun/management/Flag.c From martinrb at google.com Wed Mar 24 07:32:13 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 24 Mar 2010 00:32:13 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA90188.3090902@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> Message-ID: <1ccfd1c11003240032y77a6b77fi73b39ea698673860@mail.gmail.com> Hi Ulf, You have this interesting optimization: public static boolean isSurrogate(char ch) { - return ch >= MIN_SURROGATE && ch < MAX_SURROGATE + 1; + return (ch -= MIN_SURROGATE) >= 0 && ch < MAX_SURROGATE + 1 - MIN_SURROGATE; } Do you have any evidence that hotspot can produce better code from this, or that there is a measurable performance improvement? Or was this just an experiment? Martin From martinrb at google.com Wed Mar 24 08:24:26 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 24 Mar 2010 01:24:26 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BA56749.8020506@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA56749.8020506@gmx.de> Message-ID: <1ccfd1c11003240124v24db88c0wd8c05396a92a6fef@mail.gmail.com> Ulf, Sherman, Masayoshi, here are changes for you to review. Only the patch highSurrogate needs a separate bug filed (and CCC, please) Ulf, I've made some progress on integrating your changes, although almost all of them have been somewhat martinized: Ulf-style tidying, mostly whitespace. [mq]: Character-warnings2 http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings2 Very minor optimizations. Barely worth doing. Note my removal of the need to have n++ inside the loop. imported patch ulf-opto http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ulf-opto Addition of highSurrogate and lowSurrogate imported patch highSurrogate http://cr.openjdk.java.net/~martin/webrevs/openjdk7/highSurrogate Martin On Sat, Mar 20, 2010 at 17:24, Ulf Zibis wrote: > Sherman, please again consider about shifting Surrogate.high/low to > Character.high/lowSurrogate. From martinrb at google.com Wed Mar 24 08:32:28 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 24 Mar 2010 01:32:28 -0700 Subject: hg: jdk7/tl/jdk: 6860431: Character.isSurrogate(char ch) In-Reply-To: <1ccfd1c10909021329i34005b1bi5816e695d71a174d@mail.gmail.com> References: <20090831221217.2CEFA12912@hg.openjdk.java.net> <4A9CDB81.1050500@gmx.de> <1ccfd1c10909012021g78d4fa3cx5f6ab0792c3ba688@mail.gmail.com> <4A9E27BF.8000905@gmx.de> <1ccfd1c10909020927v74fe5ceekc91f4e4a4724a273@mail.gmail.com> <4A9E9FE9.7060107@redhat.com> <1ccfd1c10909021003o7b060a23ge700680cd75b07bf@mail.gmail.com> <4A9EA759.3050804@redhat.com> <4A9ECBAC.7060303@gmx.de> <1ccfd1c10909021329i34005b1bi5816e695d71a174d@mail.gmail.com> Message-ID: <1ccfd1c11003240132i35b9a24fldc8b4defb24364bb@mail.gmail.com> Xueming, I believe you still owe me a review and bug filed for http://cr.openjdk.java.net/~martin/webrevs/openjdk7/javadoc-unicode-escapes/ Martin On Wed, Sep 2, 2009 at 13:29, Martin Buchholz wrote: > On Wed, Sep 2, 2009 at 12:46, Ulf Zibis wrote: >> Am 02.09.2009 19:11, David M. Lloyd schrieb: >>> >>> On 09/02/2009 12:03 PM, Martin Buchholz wrote: >>>> >>>> On Wed, Sep 2, 2009 at 09:40, David M. Lloyd >>> > wrote: >>>> ? ?Why not just do {@code \uD800}? ?I'm like 60% sure that would work >>>> ? ?just fine. :-) >>>> >>>> >>>> I'm pretty sure it would fail. ? Prove me wrong! >>>> Searching the JDK sources for regex >>>> ^ *\*.*\\u[0-9a-fA-F]{4} >>>> is a good way to find javadoc bugs, e.g. >>>> http://java.sun.com/javase/6/docs/api/java/lang/String.html#toLowerCase() >>> >>> Ah, you're right. ?It worked in my previewer but not in the actual >>> javadoc. ?It's pretty bad that that sequence has special meaning but you >>> can't escape a \ with another \. ?I guess in the worst case you could always >>> do \u005CD800 or something like that. >>> >> >> Looks little better, but not much. Did somebody tried it (Martin)? > > Well.... learn something new every day. > Let's turn this into a fix. > It's yet another "turkish i" bug. > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/javadoc-unicode-escapes/ > > Xueming, please file a bug and review. > > Synopsis: Unreadable \uXXXX in javadoc > Description: Replace \uXXXX by \u005CXXXX, or simply delete > > Martin > >> If it works in a previewer, is there any chance to change the javadoc spec, >> staying backwards compatible? >> >> -Ulf >> >> >> > From Ulf.Zibis at gmx.de Wed Mar 24 17:20:17 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Wed, 24 Mar 2010 18:20:17 +0100 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <1ccfd1c11003231559x25aef975hca5b81e9dfe9b09c@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B8EB46C.1010208@sun.com> <4B92C263.9020404@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> <4BA8B285.1040403@gmx.de> <1ccfd1c11003231559x25aef975hca5b81e9dfe9b09c@mail.gmail.com> Message-ID: <4BAA49D1.8010702@gmx.de> Am 23.03.2010 23:59, schrieb Martin Buchholz: > On Tue, Mar 23, 2010 at 05:22, Ulf Zibis wrote: > >> Am 13.03.2010 00:04, schrieb Martin Buchholz: >> >>> >>>> Remembers me that some months ago I prepared a beautified version of >>>> Character's source (things like above, replacing against {@code}, >>>> indentation inconsistencies etc.) Would there be interest to provide such >>>> a >>>> patch ? >>>> > I support the plan of fixing coding style in core libraries when there is > consensus amongst developers, as there is with > => @code > and > @exception => @throws > I too would like to see 8 spaces indentation on line breaks like: if (aaaaaaaaaaaaaaa > bbbbbbbbbbbbb && ccccccccccccccc > ddddddddddddddddd) doSomething(); + opening braces at line end instead beginning a new line + blank line between package ... and import ... + no blank line between javadoc and class/method declaration + 2 spaces after period + proper indentation in @param @return @throws blocks + not too much use of braces e.g. for 1-line blocks (one can see more code lines on same screen space) + * @see #forDigit(int, int) * @see Integer#toString(int, int) instead: * @see java.lang.Character#forDigit(int, int) * @see java.lang.Integer#toString(int, int) + * range: U+DC00 through U+DFFF instead * range: 0xDC00 through 0xDFFF + {@link #isLowSurrogate(char)} {@link Character.UnicodeBlock} instead {@linkplain #isLowSurrogate(char) isLowSurrogate} {@link Character.UnicodeBlock UnicodeBlock} > I think the right way to do this is to modify large portions of the > java libraries using a script. The script should be checked into > the jdk repo as part of the fix. There should be automated verification > that the generated javadoc is left unchanged. > > There is precedent, for example the recent whitespace changes by Kelly, > and my own fixes to @since in jdk6. > > To get you started, here is some elisp code that I have used when > making such changes on a file-level: > > (defun tt-code () > (interactive) > (query-replace-regexp "<\\(tt\\|code\\)>\\([^&<>\\\\]+\\)" > "{@code \\2}")) > > I suggest as a goal, modifying java.{lang,util,io,nio} > That all sounds very good, so I should hold back my hand work. -Ulf From Xueming.Shen at Sun.COM Wed Mar 24 17:22:47 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Wed, 24 Mar 2010 10:22:47 -0700 Subject: hg: jdk7/tl/jdk: 6860431: Character.isSurrogate(char ch) In-Reply-To: <1ccfd1c11003240132i35b9a24fldc8b4defb24364bb@mail.gmail.com> References: <20090831221217.2CEFA12912@hg.openjdk.java.net> <4A9CDB81.1050500@gmx.de> <1ccfd1c10909012021g78d4fa3cx5f6ab0792c3ba688@mail.gmail.com> <4A9E27BF.8000905@gmx.de> <1ccfd1c10909020927v74fe5ceekc91f4e4a4724a273@mail.gmail.com> <4A9E9FE9.7060107@redhat.com> <1ccfd1c10909021003o7b060a23ge700680cd75b07bf@mail.gmail.com> <4A9EA759.3050804@redhat.com> <4A9ECBAC.7060303@gmx.de> <1ccfd1c10909021329i34005b1bi5816e695d71a174d@mail.gmail.com> <1ccfd1c11003240132i35b9a24fldc8b4defb24364bb@mail.gmail.com> Message-ID: <4BAA4A67.30802@sun.com> CR 6937842 Created, P4 java/classes_lang Unreadable \uXXXX in javadoc The change fine. But maybe it would be better to "escape" the \u20ac as well, instead of simply deleting them. Not a big deal. -Sherman Martin Buchholz wrote: > Xueming, > > I believe you still owe me a review and bug filed for > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/javadoc-unicode-escapes/ > > Martin > > On Wed, Sep 2, 2009 at 13:29, Martin Buchholz wrote: > >> On Wed, Sep 2, 2009 at 12:46, Ulf Zibis wrote: >> >>> Am 02.09.2009 19:11, David M. Lloyd schrieb: >>> >>>> On 09/02/2009 12:03 PM, Martin Buchholz wrote: >>>> >>>>> On Wed, Sep 2, 2009 at 09:40, David M. Lloyd >>>> > wrote: >>>>> Why not just do {@code \uD800}? I'm like 60% sure that would work >>>>> just fine. :-) >>>>> >>>>> >>>>> I'm pretty sure it would fail. Prove me wrong! >>>>> Searching the JDK sources for regex >>>>> ^ *\*.*\\u[0-9a-fA-F]{4} >>>>> is a good way to find javadoc bugs, e.g. >>>>> http://java.sun.com/javase/6/docs/api/java/lang/String.html#toLowerCase() >>>>> >>>> Ah, you're right. It worked in my previewer but not in the actual >>>> javadoc. It's pretty bad that that sequence has special meaning but you >>>> can't escape a \ with another \. I guess in the worst case you could always >>>> do \u005CD800 or something like that. >>>> >>>> >>> Looks little better, but not much. Did somebody tried it (Martin)? >>> >> Well.... learn something new every day. >> Let's turn this into a fix. >> It's yet another "turkish i" bug. >> >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/javadoc-unicode-escapes/ >> >> Xueming, please file a bug and review. >> >> Synopsis: Unreadable \uXXXX in javadoc >> Description: Replace \uXXXX by \u005CXXXX, or simply delete >> >> Martin >> >> >>> If it works in a previewer, is there any chance to change the javadoc spec, >>> staying backwards compatible? >>> >>> -Ulf >>> >>> >>> >>> From Xueming.Shen at Sun.COM Wed Mar 24 17:42:10 2010 From: Xueming.Shen at Sun.COM (Xueming Shen) Date: Wed, 24 Mar 2010 10:42:10 -0700 Subject: Concurrent calls to new Random() not random enough In-Reply-To: <1ccfd1c11003231917j13b11b2dl46d3a801ecd05919@mail.gmail.com> References: <1ccfd1c11003231550x5bc5509dy3061513adafd0e40@mail.gmail.com> <1ccfd1c11003231917j13b11b2dl46d3a801ecd05919@mail.gmail.com> Message-ID: <4BAA4EF2.1080306@sun.com> 6937857: Concurrent calls to new Random() not random enough Martin Buchholz wrote: > [+fy, jeremymanson] > > Here's a much better test case, > and a proposed fix: > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/RandomSeedCollisions > > This adds some initialization overhead, but also removes some > since > new Random() > no longer invokes a synchronized method. > > ---- > import java.util.*; > > public class RandomSeed { > public static void main(String[] args) throws Throwable { > class RandomCollector implements Runnable { > long[] randoms = new long[1<<22]; > public void run() { > for (int i = 0; i < randoms.length; i++) > randoms[i] = new Random().nextLong(); > }}; > final int threadCount = 2; > List collectors = new ArrayList(); > List threads = new ArrayList(); > for (int i = 0; i < threadCount; i++) { > RandomCollector r = new RandomCollector(); > collectors.add(r); > threads.add(new Thread(r)); > } > for (Thread thread : threads) > thread.start(); > for (Thread thread : threads) > thread.join(); > int collisions = 0; > HashSet s = new HashSet(); > for (RandomCollector r : collectors) { > for (long x : r.randoms) { > if (s.contains(x)) > collisions++; > s.add(x); > } > } > System.out.printf("collisions=%d%n", collisions); > } > } > > > On Tue, Mar 23, 2010 at 15:50, Martin Buchholz wrote: > >> Hi Sherman, >> >> This is a bug report (sorry, no fix this time) >> >> Synopsis: Concurrent calls to new Random() not random enough >> Description: >> new Random() promises this: >> /** >> * Creates a new random number generator. This constructor sets >> * the seed of the random number generator to a value very likely >> * to be distinct from any other invocation of this constructor. >> */ >> >> but if there are concurrent calls to new Random(), it does not >> do very well at fulfilling its contract. >> >> The following program should print out a number much closer to 0. >> >> import java.util.*; >> >> public class RandomSeed { >> public static void main(String[] args) throws Throwable { >> class RandomCollector implements Runnable { >> int[] randoms = new int[1<<21]; >> public void run() { >> for (int i = 0; i < randoms.length; i++) >> randoms[i] = new Random().nextInt(); >> }}; >> final int threadCount = 2; >> List collectors = new ArrayList(); >> List threads = new ArrayList(); >> for (int i = 0; i < threadCount; i++) { >> RandomCollector r = new RandomCollector(); >> collectors.add(r); >> threads.add(new Thread(r)); >> } >> for (Thread thread : threads) >> thread.start(); >> for (Thread thread : threads) >> thread.join(); >> int collisions = 0; >> HashSet s = new HashSet(); >> for (RandomCollector r : collectors) { >> for (int x : r.randoms) { >> if (s.contains(x)) >> collisions++; >> s.add(x); >> } >> } >> System.out.println(collisions); >> } >> } >> --- >> ==> javac -source 1.6 -Xlint:all RandomSeed.java >> ==> java -esa -ea RandomSeed >> 876 >> >> From martinrb at google.com Wed Mar 24 18:48:27 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 24 Mar 2010 11:48:27 -0700 Subject: hg: jdk7/tl/jdk: 6860431: Character.isSurrogate(char ch) In-Reply-To: <4BAA4A67.30802@sun.com> References: <20090831221217.2CEFA12912@hg.openjdk.java.net> <4A9E27BF.8000905@gmx.de> <1ccfd1c10909020927v74fe5ceekc91f4e4a4724a273@mail.gmail.com> <4A9E9FE9.7060107@redhat.com> <1ccfd1c10909021003o7b060a23ge700680cd75b07bf@mail.gmail.com> <4A9EA759.3050804@redhat.com> <4A9ECBAC.7060303@gmx.de> <1ccfd1c10909021329i34005b1bi5816e695d71a174d@mail.gmail.com> <1ccfd1c11003240132i35b9a24fldc8b4defb24364bb@mail.gmail.com> <4BAA4A67.30802@sun.com> Message-ID: <1ccfd1c11003241148g62745800o3f5f81b2a38e9215@mail.gmail.com> On Wed, Mar 24, 2010 at 10:22, Xueming Shen wrote: > > CR 6937842 Created, P4 java/classes_lang Unreadable \uXXXX in javadoc Thanks. > The change fine. But maybe it would be better to "escape" the \u20ac as > well, instead of > simply deleting them. Not a big deal. I prefer to leave them out, because the example has nothing to do with exotic characters. Martin From jonathan.gibbons at sun.com Wed Mar 24 19:19:58 2010 From: jonathan.gibbons at sun.com (jonathan.gibbons at sun.com) Date: Wed, 24 Mar 2010 19:19:58 +0000 Subject: hg: jdk7/tl/langtools: 6937318: jdk7 b86: javah and javah -help is no output for these commands Message-ID: <20100324192003.81CA4441A5@hg.openjdk.java.net> Changeset: 3058880c0b8d Author: jjg Date: 2010-03-24 12:18 -0700 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/3058880c0b8d 6937318: jdk7 b86: javah and javah -help is no output for these commands Reviewed-by: darcy ! src/share/classes/com/sun/tools/javah/JavahTask.java ! test/tools/javah/T6893943.java From martinrb at google.com Wed Mar 24 19:34:13 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 24 Mar 2010 12:34:13 -0700 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <4BAA49D1.8010702@gmx.de> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> <4BA8B285.1040403@gmx.de> <1ccfd1c11003231559x25aef975hca5b81e9dfe9b09c@mail.gmail.com> <4BAA49D1.8010702@gmx.de> Message-ID: <1ccfd1c11003241234s5f7c4ec5l9570705d51892567@mail.gmail.com> On Wed, Mar 24, 2010 at 10:20, Ulf Zibis wrote: > Am 23.03.2010 23:59, schrieb Martin Buchholz: > I too would like to see 8 spaces indentation on line breaks like: > ? ?if (aaaaaaaaaaaaaaa > bbbbbbbbbbbbb && > ? ? ? ? ? ?ccccccccccccccc > ddddddddddddddddd) > ? ? ? ?doSomething(); This appears to be a new style (perhaps coming from the java IDEs?) but it would be too pervasive a change for the JDK sources. > + opening braces at line end instead beginning a new line Perhaps too difficult/controversial? > + blank line between package ... and import ... This could be done, and automated. > > + no blank line between javadoc and class/method declaration Yes. > + 2 spaces after period I agree with this style, but there is not enough consensus. > + proper indentation in @param @return @throws blocks Perhaps too difficult to automate? > + not too much use of braces e.g. for 1-line blocks (one can see more code > lines on same screen space) I agree with this personally, but there is violent disagreement in the java programmer community. E.g. google's style guide requires braces everywhere. > + > ? ? * @see ? ?#forDigit(int, int) > ? ? * @see ? ?Integer#toString(int, int) > instead: > ? ? * @see ? ? java.lang.Character#forDigit(int, int) > ? ? * @see ? ? java.lang.Integer#toString(int, int) I did a global s/java\.lang\.// in Character.java. > + > ? ? ? ? * range: U+DC00 through U+DFFF > instead > ? ? ? ? * range: 0xDC00 through 0xDFFF I disagree. The U+ notation should be reserved for Unicode characters (code points) and not UTF-16 code units (which surrogates are). > + > ? ?{@link #isLowSurrogate(char)} > ? ?{@link Character.UnicodeBlock} > instead > ? ?{@linkplain #isLowSurrogate(char) isLowSurrogate} > {@link Character.UnicodeBlock UnicodeBlock} I've removed the above. Martin From joe.darcy at sun.com Thu Mar 25 00:03:57 2010 From: joe.darcy at sun.com (joe.darcy at sun.com) Date: Thu, 25 Mar 2010 00:03:57 +0000 Subject: hg: jdk7/tl/langtools: 6937417: javac -Xprint returns IndexOutOfBoundsException Message-ID: <20100325000401.C8A36441EB@hg.openjdk.java.net> Changeset: 65e422bbb984 Author: darcy Date: 2010-03-24 17:02 -0700 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/65e422bbb984 6937417: javac -Xprint returns IndexOutOfBoundsException Reviewed-by: jjg ! src/share/classes/com/sun/tools/javac/processing/PrintingProcessor.java + test/tools/javac/processing/model/util/elements/VacuousEnum.java From weijun.wang at sun.com Thu Mar 25 04:09:04 2010 From: weijun.wang at sun.com (weijun.wang at sun.com) Date: Thu, 25 Mar 2010 04:09:04 +0000 Subject: hg: jdk7/tl/jdk: 6813340: X509Factory should not depend on is.available()==0 Message-ID: <20100325040917.7E9654422E@hg.openjdk.java.net> Changeset: 26477628f2d5 Author: weijun Date: 2010-03-25 12:07 +0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/26477628f2d5 6813340: X509Factory should not depend on is.available()==0 Reviewed-by: xuelei ! src/share/classes/sun/security/provider/X509Factory.java ! src/share/classes/sun/security/tools/KeyTool.java + test/java/security/cert/CertificateFactory/ReturnStream.java + test/java/security/cert/CertificateFactory/SlowStream.java + test/java/security/cert/CertificateFactory/slowstream.sh From christopher.hegarty at sun.com Thu Mar 25 09:40:31 2010 From: christopher.hegarty at sun.com (christopher.hegarty at sun.com) Date: Thu, 25 Mar 2010 09:40:31 +0000 Subject: hg: jdk7/tl/jdk: 6937703: java/net regression test issues with samevm Message-ID: <20100325094056.833F744286@hg.openjdk.java.net> Changeset: 6109b166bf68 Author: chegar Date: 2010-03-25 09:38 +0000 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/6109b166bf68 6937703: java/net regression test issues with samevm Reviewed-by: alanb ! test/ProblemList.txt ! test/java/net/ProxySelector/B6737819.java ! test/java/net/ResponseCache/ResponseCacheTest.java ! test/java/net/ResponseCache/getResponseCode.java ! test/java/net/URL/TestIPv6Addresses.java ! test/java/net/URLClassLoader/HttpTest.java ! test/java/net/URLConnection/B5052093.java ! test/java/net/URLConnection/contentHandler/UserContentHandler.java From Ulf.Zibis at gmx.de Thu Mar 25 13:41:31 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 25 Mar 2010 14:41:31 +0100 Subject: Review patches isBMPCodePoint/2/3 In-Reply-To: <1ccfd1c11003231119g19d9e4d2x881322e9ed6c9b27@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003231119g19d9e4d2x881322e9ed6c9b27@mail.gmail.com> Message-ID: <4BAB680B.7050606@gmx.de> Am 23.03.2010 19:19, schrieb Martin Buchholz: > Ulf, > > Please do not delete methods in Surrogate.java > (because we take compatibility seriously) > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint I still think, we should stick on Surrogate#isBMP for above compatibility reason. Otherwise we too should rename #neededFor etc. Please add @author Ulf Zibis and correct copyright date. http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint2 http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint2/src/share/classes/java/lang/AbstractStringBuilder.java.sdiff.html Looks good, but please add @author Ulf Zibis and correct copyright date. http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint2/src/share/classes/java/lang/Character.java.sdiff.html Looks good, but please add @author Ulf Zibis and correct copyright date. http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint2/src/share/classes/java/lang/String.java.sdiff.html Looks good, but please add @author Ulf Zibis and correct copyright date. http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint2/src/share/classes/sun/nio/cs/Surrogate.java.sdiff.html Looks good, but I still think we should use deprecated. "deprecate" just means "don't use it if even possible", IMO not only for APIs that are actively harmful. Imagine, users code relies on existing sun package API, because there was no appropriate public API in the past. He is *used to ignore* the warning: Surrogate is Sun proprietary API and may be removed in a future release. "Deprecated" will give him new attention, so it's likely, he will notice, that there are new API's since JDK 7. http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint3 http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint3/test/java/nio/charset/coders/BashStreams.java.sdiff.html if (Character.isBMPCodePoint(c) && (c >= '\uFFFE' || Character.isSurrogate((char))c))); Please use 8-space-indentation for line continuation, following looks ugly: 259 if (Character.isHighSurrogate(c) 260 && (cb.remaining() == 1)) { 261 cg.push(c); 262 break; 263 } -Ulf > but instead gently denigrate them, > as I do below (added to my patch isBMPCodePoint2) > > diff --git a/src/share/classes/sun/nio/cs/Surrogate.java > b/src/share/classes/sun/nio/cs/Surrogate.java > --- a/src/share/classes/sun/nio/cs/Surrogate.java > +++ b/src/share/classes/sun/nio/cs/Surrogate.java > @@ -77,6 +77,7 @@ > /** > * Tells whether or not the given UCS-4 character must be represented as a > * surrogate pair in UTF-16. > + * Use of {@link Character#isSupplementaryCodePoint} is generally > preferred. > */ > public static boolean neededFor(int uc) { > return Character.isSupplementaryCodePoint(uc); > @@ -102,6 +103,7 @@ > > /** > * Converts the given surrogate pair into a 32-bit UCS-4 character. > + * Use of {@link Character#toCodePoint} is generally preferred. > */ > public static int toUCS4(char c, char d) { > assert Character.isHighSurrogate(c)&& Character.isLowSurrogate(d); > > > Martin > > > From Ulf.Zibis at gmx.de Thu Mar 25 14:03:47 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 25 Mar 2010 15:03:47 +0100 Subject: Review patches isBMPCodePoint/2/3 In-Reply-To: <4BAB680B.7050606@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003231119g19d9e4d2x881322e9ed6c9b27@mail.gmail.com> <4BAB680B.7050606@gmx.de> Message-ID: <4BAB6D43.3010808@gmx.de> Am 25.03.2010 14:41, schrieb Ulf Zibis: > Please use 8-space-indentation for line continuation, following looks > ugly: Oops, looked good in my TB edit window, but should be corrected: 259 if (Character.isHighSurrogate(c) 260 && (cb.remaining() == 1)) { 261 cg.push(c); 262 break; 263 } From Ulf.Zibis at gmx.de Thu Mar 25 15:26:13 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 25 Mar 2010 16:26:13 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003221503r46e6bb78g241e2b07ff7f1b3c@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> <4BA60A40.9050600@gmx.de> <1ccfd1c11003210916rad35d31wb0501f9bf960b07@mail.gmail.com> <4BA78011.2070504@gmx.de> <1ccfd1c11003221503r46e6bb78g241e2b07ff7f1b3c@mail.gmail.com> Message-ID: <4BAB8095.8030903@gmx.de> Am 22.03.2010 23:03, schrieb Martin Buchholz: > On Mon, Mar 22, 2010 at 07:34, Ulf Zibis wrote: > >> Am 21.03.2010 17:16, schrieb Martin Buchholz: >> > >>> There is a debate about whether to reuse existing exception classes >>> or to throw class-specific subclasses. IMO, IOOBE is a sufficiently >>> expressive >>> exception that I might have used just that, with expressive detail >>> messages. >>> >>> >> I'm with you. Especially StringIndexOutOfBoundsException appears as >> superfluous sugar to me. But we have it in the docs, so there is no way to >> get rid of it. >> What do you think about to refactor most IOOBEs in String related classes to >> SIOOBEs? It would stay compatible to old Software, which still catches >> IOOBEs, but would look more straight, tidy and clean and fix the below >> mentioned bug. >> > Every change is an incompatible change, with a risk/benefit tradeoff. > > IMO there is no change to the exceptions thrown, or declared to be thrown, > or to their detail messages, in the string classes that is worth the risk > of incompatible change. > Is somewhat reasonable, but what's the win of those "creative" variations on exception messages _and_ types in AbstractStringBuilder? : throw new StringIndexOutOfBoundsException(); throw new StringIndexOutOfBoundsException(index); throw new StringIndexOutOfBoundsException(start); throw new StringIndexOutOfBoundsException("start > length()"); throw new StringIndexOutOfBoundsException("start > end"); throw new StringIndexOutOfBoundsException(end - start); throw new StringIndexOutOfBoundsException(srcEnd); throw new StringIndexOutOfBoundsException("srcBegin > srcEnd"); throw new IndexOutOfBoundsException(); throw new IndexOutOfBoundsException("start " + start + ", end " + end + ", s.length() " + s.length()); throw new IndexOutOfBoundsException("dstOffset "+dstOffset); > (with the exception of when the implementation contradicts the spec, > which is worth fixing) > #insert(int, char[], in, int), uses System.arraycopy(). If capacity doesn't suffice, it would throw an IOOBE, not SIOOBE #insert(int, CharSequence) states: * @throws IndexOutOfBoundsException if the offset is invalid. but (1) in fact throws SIOOBE in described case, if CharSequence is of String. and (2) additionally throws IOOBE in case of capacity overflow, which is not mentioned. #insert(...) methods mix between (int index, ...) and (int dstIndex, ...) without any reason. #substring(int) could be faster not using substring(int, int) detailed bounds checking. #subSequence(int, int) in fact throws SIOOBE instead IOOBE. #appendCodePoint(int) could throw AIOOBE, similar to many other append methods, capacity overflow behaviour is not documented. I stop here ... ;-) -Ulf From Ulf.Zibis at gmx.de Thu Mar 25 16:18:44 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 25 Mar 2010 17:18:44 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <1ccfd1c11003240032y77a6b77fi73b39ea698673860@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003240032y77a6b77fi73b39ea698673860@mail.gmail.com> Message-ID: <4BAB8CE4.8020804@gmx.de> Am 24.03.2010 08:32, schrieb Martin Buchholz: > Hi Ulf, > > You have this interesting optimization: > > public static boolean isSurrogate(char ch) { > - return ch>= MIN_SURROGATE&& ch< MAX_SURROGATE + 1; > + return (ch -= MIN_SURROGATE)>= 0&& ch< MAX_SURROGATE + 1 - > MIN_SURROGATE; > } > > Do you have any evidence that hotspot can produce better code from this, > or that there is a measurable performance improvement? > Or was this just an experiment? > If isHighSurrogate and isSurrogate are used consecutive on same char, result of ch -= MIN_SURROGATE could be used for both. If isLowSurrogate and isSurrogate are used consecutive on same char, result of ch -= MAX_SURROGATE would fit better. If isHighSurrogate and isLowSurrogate are used consecutive on same char, result of ch -= MIN_LOW_SURROGATE would fit better. I suggest using 1st pair in JDK library. -Ulf From Ulf.Zibis at gmx.de Thu Mar 25 17:19:06 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 25 Mar 2010 18:19:06 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <1ccfd1c11003240124v24db88c0wd8c05396a92a6fef@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA56749.8020506@gmx.de> <1ccfd1c11003240124v24db88c0wd8c05396a92a6fef@mail.gmail.com> Message-ID: <4BAB9B0A.7030207@gmx.de> Am 24.03.2010 09:24, schrieb Martin Buchholz: > Ulf, Sherman, Masayoshi, > here are changes for you to review. > Only the patch highSurrogate needs a separate bug filed > (and CCC, please) > > Ulf, I've made some progress on integrating your changes, > although almost all of them have been somewhat martinized: > > Ulf-style tidying, mostly whitespace. > [mq]: Character-warnings2 > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings2 > I would prefer (better visibility of continued line): public final class Character implements java.io.Serializable, Comparable { I would prefer (indicates, that we are in current class): #isDigit(char) instead Character#isDigit(char) but indeed better than java.lang.Character#isDigit(char) > Very minor optimizations. Barely worth doing. > Note my removal of the need to have n++ inside the loop. > Overseen. Shame on me, as that's true Ulf-style. Yes, reduces in/decrements on rare supplementary cases. > imported patch ulf-opto > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ulf-opto > > Addition of highSurrogate and lowSurrogate > imported patch highSurrogate > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/highSurrogate > Looks good. Interesting workaround on my "Note:" I've reckoned with dropping my highSurrogate(char highCPWord, char lowCPWord). Anyway I like to note, that I use that shortcut in my EUC_TW$Decoder twiddling. Following code: da[dp] = Character.highSurrogate(0x20000 + c); results in (19 bytes): 0x00b8ae27: add $0x20000,%ecx ;*iadd ; - sun.nio.cs.ext.D_21_d_narrow::decode at 98 (line 196) 0x00b8ae2d: mov %ecx,%ebp 0x00b8ae2f: shr $0xa,%ebp 0x00b8ae32: add $0xd7c0,%ebp ;*isub ; - java.lang.Character::highSurrogate at 9 (line 3343) ; - sun.nio.cs.ext.D_21_d_narrow::decode at 99 (line 196) da[dp] = Character.highSurrogate((char)0x2, c); results in (9 bytes): 0x00b899e7: shr $0xa,%ebp 0x00b899ea: add $0xd840,%ebp ;*isub ; - java.lang.Character::highSurrogate at 14 (line 3365) ; - sun.nio.cs.ext.D_22_d_n_fastSurrogate::decode at 97 (line 196) dst.putInt(Character.highSurrogate((char)0x2, c)) << 16 | Character.lowSurrogate(c)); would additionally increase performance. I'm still preparing the benchmark + disassembly. Those twiddling could be used in all surrogate processing charset coders, e.g. maybe true for UTF_x. If public, would be too useful for developers coding charset coders for exotic charsets via java.nio.charset.spi.CharsetProvider -Ulf From Ulf.Zibis at gmx.de Thu Mar 25 17:20:20 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 25 Mar 2010 18:20:20 +0100 Subject: hg: jdk7/tl/jdk: 6860431: Character.isSurrogate(char ch) In-Reply-To: <1ccfd1c11003241148g62745800o3f5f81b2a38e9215@mail.gmail.com> References: <20090831221217.2CEFA12912@hg.openjdk.java.net> <4A9E27BF.8000905@gmx.de> <1ccfd1c10909020927v74fe5ceekc91f4e4a4724a273@mail.gmail.com> <4A9E9FE9.7060107@redhat.com> <1ccfd1c10909021003o7b060a23ge700680cd75b07bf@mail.gmail.com> <4A9EA759.3050804@redhat.com> <4A9ECBAC.7060303@gmx.de> <1ccfd1c10909021329i34005b1bi5816e695d71a174d@mail.gmail.com> <1ccfd1c11003240132i35b9a24fldc8b4defb24364bb@mail.gmail.com> <4BAA4A67.30802@sun.com> <1ccfd1c11003241148g62745800o3f5f81b2a38e9215@mail.gmail.com> Message-ID: <4BAB9B54.7070102@gmx.de> Am 24.03.2010 19:48, schrieb Martin Buchholz: > On Wed, Mar 24, 2010 at 10:22, Xueming Shen wrote: > >> CR 6937842 Created, P4 java/classes_lang Unreadable \uXXXX in javadoc >> > Thanks. > > >> The change fine. But maybe it would be better to "escape" the \u20ac as >> well, instead of >> simply deleting them. Not a big deal. >> > I prefer to leave them out, because the example has nothing to do > with exotic characters. > +1 -Ulf From Ulf.Zibis at gmx.de Thu Mar 25 20:26:26 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 25 Mar 2010 21:26:26 +0100 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <1ccfd1c11003241234s5f7c4ec5l9570705d51892567@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003091705k44447654wbdb311a48a1c7bb4@mail.gmail.com> <4B97E3BD.2000901@sun.com> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> <4BA8B285.1040403@gmx.de> <1ccfd1c11003231559x25aef975hca5b81e9dfe9b09c@mail.gmail.com> <4BAA49D1.8010702@gmx.de> <1ccfd1c11003241234s5f7c4ec5l9570705d51892567@mail.gmail.com> Message-ID: <4BABC6F2.70604@gmx.de> Am 24.03.2010 20:34, schrieb Martin Buchholz: > On Wed, Mar 24, 2010 at 10:20, Ulf Zibis wrote: > >> Am 23.03.2010 23:59, schrieb Martin Buchholz: >> > >> I too would like to see 8 spaces indentation on line breaks like: >> if (aaaaaaaaaaaaaaa> bbbbbbbbbbbbb&& >> ccccccccccccccc> ddddddddddddddddd) >> doSomething(); >> > This appears to be a new style (perhaps coming from the java IDEs?) > This rule is much older: http://java.sun.com/docs/codeconv/html/CodeConventions.doc3.html#248 But yes, I first saw this from NetBeans IDE formatting facility. > but it would be too pervasive a change for the JDK sources. > but wouldn't be a big deal for old stagers when coding new lines. ;-) > >> + opening braces at line end instead beginning a new line >> > Perhaps too difficult/controversial? > Yes, you will know that better from your team. See: http://java.sun.com/docs/codeconv/html/CodeConventions.doc10.html#182 > >> + blank line between package ... and import ... >> > This could be done, and automated. > > >> + no blank line between javadoc and class/method declaration >> > Yes. > > >> + 2 spaces after period >> > I agree with this style, but there is not enough consensus. > See comment for braces at line end. See: http://java.sun.com/j2se/javadoc/writingdoccomments/index.html#examples See: http://java.sun.com/j2se/javadoc/writingdoccomments/package-template > >> + proper indentation in @param @return @throws blocks >> > Perhaps too difficult to automate? > I can understand. For the Character class I did it manually. See: http://java.sun.com/j2se/javadoc/writingdoccomments/index.html#format I guess we can ignore the 2nd column tabulator, especially if names become looooong > >> + not too much use of braces e.g. for 1-line blocks (one can see more code >> lines on same screen space) >> > I agree with this personally, but there is violent disagreement > in the java programmer community. E.g. google's style guide > requires braces everywhere. > IMO, this should not be followed too bureaucratic. Think about labtop/netbook users or plenty open windows from IDE. Additionally scrolling becomes a kinda nightmare. And as you can see in the existing code base, things are never as bad as they seem, or the devil is not as black as he is painted. ;-) > >> + >> * @see #forDigit(int, int) >> * @see Integer#toString(int, int) >> instead: >> * @see java.lang.Character#forDigit(int, int) >> * @see java.lang.Integer#toString(int, int) >> > I did a global s/java\.lang\.// in Character.java. > As justified before, I would drop the current classes name. See: http://java.sun.com/j2se/javadoc/writingdoccomments/index.html#tag > >> + >> * range: U+DC00 through U+DFFF >> instead >> * range: 0xDC00 through 0xDFFF >> > I disagree. The U+ notation should be reserved for > Unicode characters (code points) and not UTF-16 > code units (which surrogates are). > I fully agree, but in the context, where I wanted to change this, the matter actually was about code points, not code units, and ... in case of Java char/UTF-16 code units, IMO we should use \u notation. 0x notation should only be used for none Unicode charsets binary values. > >> + >> {@link #isLowSurrogate(char)} >> {@link Character.UnicodeBlock} >> instead >> {@linkplain #isLowSurrogate(char) isLowSurrogate} >> {@link Character.UnicodeBlock UnicodeBlock} >> > I've removed the above. > BTW, I can't find any docu about {@linkplain ...}. What is the advantage against simple {@link ...}? Additionally I like to mention for class Character: - numerous javadoc blocks are only indented by 3 instead 4 spaces. - some code lines are indented by 5 instead 4 spaces. - I still dislike the space after a cast, refer to internal review ID of 1740052. - several UnicodeBlock declarations differ little in indentation/whitespace usage from the average. I would prefer: public static final UnicodeBlock SUPPLEMENTARY_PRIVATE_USE_AREA_A = new UnicodeBlock("SUPPLEMENTARY_PRIVATE_USE_AREA_A", new String[] { "Supplementary Private Use Area-A", "SupplementaryPrivateUseArea-A" }); -Ulf From Ulf.Zibis at gmx.de Thu Mar 25 20:42:40 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 25 Mar 2010 21:42:40 +0100 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <1ccfd1c11003240124v24db88c0wd8c05396a92a6fef@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA56749.8020506@gmx.de> <1ccfd1c11003240124v24db88c0wd8c05396a92a6fef@mail.gmail.com> Message-ID: <4BABCAC0.4010704@gmx.de> Am 24.03.2010 09:24, schrieb Martin Buchholz: > Ulf, Sherman, Masayoshi, > here are changes for you to review. > Only the patch highSurrogate needs a separate bug filed > (and CCC, please) > I had just filed it 2 weeks ago, see: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6933322 -Ulf From Ulf.Zibis at gmx.de Thu Mar 25 20:52:42 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 25 Mar 2010 21:52:42 +0100 Subject: review request 6933322 - Add methods highSurrogate(), lowSurrogate() to class Character In-Reply-To: <4BABCAC0.4010704@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA56749.8020506@gmx.de> <1ccfd1c11003240124v24db88c0wd8c05396a92a6fef@mail.gmail.com> <4BABCAC0.4010704@gmx.de> Message-ID: <4BABCD1A.2040205@gmx.de> Updated topic. -Ulf Am 25.03.2010 21:42, schrieb Ulf Zibis: > Am 24.03.2010 09:24, schrieb Martin Buchholz: >> Ulf, Sherman, Masayoshi, >> here are changes for you to review. >> Only the patch highSurrogate needs a separate bug filed >> (and CCC, please) > > I had just filed it 2 weeks ago, see: > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6933322 > > -Ulf > > > From martinrb at google.com Thu Mar 25 21:27:55 2010 From: martinrb at google.com (Martin Buchholz) Date: Thu, 25 Mar 2010 14:27:55 -0700 Subject: Review patches isBMPCodePoint/2/3 In-Reply-To: <4BAB680B.7050606@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003231119g19d9e4d2x881322e9ed6c9b27@mail.gmail.com> <4BAB680B.7050606@gmx.de> Message-ID: <1ccfd1c11003251427t37d024dfta9b951b4f8137c80@mail.gmail.com> On Thu, Mar 25, 2010 at 06:41, Ulf Zibis wrote: > Am 23.03.2010 19:19, schrieb Martin Buchholz: >> >> Ulf, >> >> Please do not delete methods in Surrogate.java >> (because we take compatibility seriously) >> > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint > > I still think, we should stick on Surrogate#isBMP for above compatibility > reason. > Otherwise we too should rename #neededFor etc. The difference is that isBMP was not provided with any officially supported JDK (i.e. Sun JDK 6). We take compatibility seriously, but not that seriously :) > Please add @author Ulf Zibis $ rg -l '@author.*Zibis' ./src/share/classes/java/lang/Character.java ./src/share/classes/java/lang/AbstractStringBuilder.java ./src/share/classes/java/lang/String.java ./src/share/classes/sun/nio/cs/Surrogate.java > and correct copyright date. I leave that up to the Sun release engineers. Martin From martinrb at google.com Thu Mar 25 21:47:06 2010 From: martinrb at google.com (Martin Buchholz) Date: Thu, 25 Mar 2010 14:47:06 -0700 Subject: Review patches isBMPCodePoint/2/3 In-Reply-To: <4BAB680B.7050606@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003231119g19d9e4d2x881322e9ed6c9b27@mail.gmail.com> <4BAB680B.7050606@gmx.de> Message-ID: <1ccfd1c11003251447u697de858pbffc9db1a35cdb51@mail.gmail.com> Here's another minor performance tweak to public String(int[] codePoints, int offset, int count) { that optimizes for BMP. // Pass 1: Compute precise size of char[] int n = count; for (int i = offset; i < end; i++) { int c = codePoints[i]; if (Character.isBMPCodePoint(c)) ; else if (Character.isSupplementaryCodePoint(c)) n++; else throw new IllegalArgumentException(Integer.toString(c)); } http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/ Martin From Ulf.Zibis at gmx.de Thu Mar 25 22:19:34 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 25 Mar 2010 23:19:34 +0100 Subject: Review patches isBMPCodePoint/2/3 In-Reply-To: <1ccfd1c11003251447u697de858pbffc9db1a35cdb51@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003231119g19d9e4d2x881322e9ed6c9b27@mail.gmail.com> <4BAB680B.7050606@gmx.de> <1ccfd1c11003251447u697de858pbffc9db1a35cdb51@mail.gmail.com> Message-ID: <4BABE176.4030107@gmx.de> Am 25.03.2010 22:47, schrieb Martin Buchholz: > Here's another minor performance tweak to > > public String(int[] codePoints, int offset, int count) { > > that optimizes for BMP. > > // Pass 1: Compute precise size of char[] > int n = count; > for (int i = offset; i< end; i++) { > int c = codePoints[i]; > if (Character.isBMPCodePoint(c)) > ; > else if (Character.isSupplementaryCodePoint(c)) > n++; > else throw new IllegalArgumentException(Integer.toString(c)); > } > Yes, this is a valuable pattern, you found out. I think, it could look smarter/more clear: if (Character.isBMPCodePoint(c)) continue; if (Character.isSupplementaryCodePoint(c)) n++; else throw new IllegalArgumentException(Integer.toString(c)); And this would be faster, as isSupplementaryCodePoint is not optimized for following isBMPCodePoint: if (Character.isBMPCodePoint(c)) continue; if (!Character.isValidCodePoint(c)) throw new IllegalArgumentException(Integer.toString(c)); n++; Before you go to the meeting, maybe scan the JDK for similar use cases, before I get addicted too, and don't forget to define c as final. It's enough, that I'm addicted from: // fill backwards for VM performance reasons, reduces register pressure, faster compare against 0 for (int i = end; n > 0; ) { int c = codePoints[--i]; if (Character.isBMPCodePoint(c)) v[--n] = (char)c; else Character.toSurrogates(c, v, n-=2); } -Ulf > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/ > > Martin > > > From Ulf.Zibis at gmx.de Thu Mar 25 22:36:05 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Thu, 25 Mar 2010 23:36:05 +0100 Subject: Review patches isBMPCodePoint/2/3 In-Reply-To: <1ccfd1c11003251427t37d024dfta9b951b4f8137c80@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003231119g19d9e4d2x881322e9ed6c9b27@mail.gmail.com> <4BAB680B.7050606@gmx.de> <1ccfd1c11003251427t37d024dfta9b951b4f8137c80@mail.gmail.com> Message-ID: <4BABE555.9070301@gmx.de> Am 25.03.2010 22:27, schrieb Martin Buchholz: > On Thu, Mar 25, 2010 at 06:41, Ulf Zibis wrote: > >> Am 23.03.2010 19:19, schrieb Martin Buchholz: >> >>> Ulf, >>> >>> Please do not delete methods in Surrogate.java >>> (because we take compatibility seriously) >>> >>> >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint >> >> I still think, we should stick on Surrogate#isBMP for above compatibility >> reason. >> Otherwise we too should rename #neededFor etc. >> > The difference is that isBMP was not provided with any officially supported JDK > (i.e. Sun JDK 6). We take compatibility seriously, but not that seriously :) > You're right. So we can make it private, and no one would come to the idea to use it in-advisedly. > $ rg -l '@author.*Zibis' > ./src/share/classes/java/lang/Character.java > ./src/share/classes/java/lang/AbstractStringBuilder.java > ./src/share/classes/java/lang/String.java > ./src/share/classes/sun/nio/cs/Surrogate.java > Thanks, -Ulf From martinrb at google.com Thu Mar 25 23:33:44 2010 From: martinrb at google.com (Martin Buchholz) Date: Thu, 25 Mar 2010 16:33:44 -0700 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <4BABC6F2.70604@gmx.de> References: <4A95079A.8080803@gmx.de> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> <4BA8B285.1040403@gmx.de> <1ccfd1c11003231559x25aef975hca5b81e9dfe9b09c@mail.gmail.com> <4BAA49D1.8010702@gmx.de> <1ccfd1c11003241234s5f7c4ec5l9570705d51892567@mail.gmail.com> <4BABC6F2.70604@gmx.de> Message-ID: <1ccfd1c11003251633j3f735662m23dde18b8973bb9@mail.gmail.com> On Thu, Mar 25, 2010 at 13:26, Ulf Zibis wrote: > Am 24.03.2010 20:34, schrieb Martin Buchholz: >> >> On Wed, Mar 24, 2010 at 10:20, Ulf Zibis ?wrote: >> >>> >>> Am 23.03.2010 23:59, schrieb Martin Buchholz: >>> >> >> >>> >>> I too would like to see 8 spaces indentation on line breaks like: >>> ? ?if (aaaaaaaaaaaaaaa> ?bbbbbbbbbbbbb&& >>> ? ? ? ? ? ?ccccccccccccccc> ?ddddddddddddddddd) >>> ? ? ? ?doSomething(); >>> >> >> This appears to be a new style (perhaps coming from the java IDEs?) >> > > This rule is much older: > http://java.sun.com/docs/codeconv/html/CodeConventions.doc3.html#248 > But yes, I first saw this from NetBeans IDE formatting facility. Ahhh, thank you very much for this history lesson. I have manually adjusted some source files as you requested, but systematically fixing this particular coding style bug is likely to be difficult. >>> >>> + >>> ? ? * @see ? ?#forDigit(int, int) >>> ? ? * @see ? ?Integer#toString(int, int) >>> instead: >>> ? ? * @see ? ? java.lang.Character#forDigit(int, int) >>> ? ? * @see ? ? java.lang.Integer#toString(int, int) >>> >> >> I did a global s/java\.lang\.// in Character.java. >> > > As justified before, I would drop the current classes name. > See: http://java.sun.com/j2se/javadoc/writingdoccomments/index.html#tag For this particular source file, I am going to mildly disagree with you, and keep as is. >>> ? ? ? ? * range: U+DC00 through U+DFFF >>> instead >>> ? ? ? ? * range: 0xDC00 through 0xDFFF >>> >> >> I disagree. ?The U+ notation should be reserved for >> Unicode characters (code points) and not UTF-16 >> code units (which surrogates are). >> > > I fully agree, but in the context, where I wanted to change this, the matter > actually was about code points, not code units, and ... > in case of Java char/UTF-16 code units, IMO we should use \u notation. > 0x notation should only be used for none Unicode charsets binary values. Oh, I see. You are right. Patch coming up. > BTW, I can't find any docu about {@linkplain ...}. > What is the advantage against simple {@link ...}? http://java.sun.com/j2se/1.4.2/docs/tooldocs/javadoc/whatsnew-1.4.html > Additionally I like to mention for class Character: > - numerous javadoc blocks are only indented by 3 instead 4 spaces. Addressed in one of my current patches. > - several UnicodeBlock declarations differ little in indentation/whitespace > usage from the average. I would prefer: > ? ? ? ?public static final UnicodeBlock SUPPLEMENTARY_PRIVATE_USE_AREA_A = > ? ? ? ? ? ?new UnicodeBlock("SUPPLEMENTARY_PRIVATE_USE_AREA_A", > ? ? ? ? ? ? ? ? ? ? ? ? ? ? new String[] { "Supplementary Private Use > Area-A", > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"SupplementaryPrivateUseArea-A" > }); See forthcoming patch. From martinrb at google.com Thu Mar 25 23:37:08 2010 From: martinrb at google.com (Martin Buchholz) Date: Thu, 25 Mar 2010 16:37:08 -0700 Subject: Minor improvements to Character.UnicodeBlock Message-ID: <1ccfd1c11003251637u6716d8efq5f8e4846ca07093@mail.gmail.com> Hi Masayoshi and Ulf, I'd like you to do a code review. There are actual doc bugs in the specification of the surrogate unicode blocks. Ulf convinced me that we should use U+ notation for Unicode blocks. http://cr.openjdk.java.net/~martin/webrevs/openjdk7/UnicodeBlock/ Martin From martinrb at google.com Thu Mar 25 23:47:19 2010 From: martinrb at google.com (Martin Buchholz) Date: Thu, 25 Mar 2010 16:47:19 -0700 Subject: Review patches isBMPCodePoint/2/3 In-Reply-To: <4BABE176.4030107@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003231119g19d9e4d2x881322e9ed6c9b27@mail.gmail.com> <4BAB680B.7050606@gmx.de> <1ccfd1c11003251447u697de858pbffc9db1a35cdb51@mail.gmail.com> <4BABE176.4030107@gmx.de> Message-ID: <1ccfd1c11003251647i89937b9u2faf43e67135507@mail.gmail.com> On Thu, Mar 25, 2010 at 15:19, Ulf Zibis wrote: > Am 25.03.2010 22:47, schrieb Martin Buchholz: >> >> Here's another minor performance tweak to >> >> ? ? public String(int[] codePoints, int offset, int count) { >> >> that optimizes for BMP. >> >> ? ? ? ? // Pass 1: Compute precise size of char[] >> ? ? ? ? int n = count; >> ? ? ? ? for (int i = offset; i< ?end; i++) { >> ? ? ? ? ? ? int c = codePoints[i]; >> ? ? ? ? ? ? if (Character.isBMPCodePoint(c)) >> ? ? ? ? ? ? ? ? ; >> ? ? ? ? ? ? else if (Character.isSupplementaryCodePoint(c)) >> ? ? ? ? ? ? ? ? n++; >> ? ? ? ? ? ? else throw new IllegalArgumentException(Integer.toString(c)); >> ? ? ? ? } >> > > Yes, this is a valuable pattern, you found out. > I think, it could look smarter/more clear: > > ? ? ? ? ? ?if (Character.isBMPCodePoint(c)) > ? ? ? ? ? ? ? ?continue; > ? ? ? ? ? ?if (Character.isSupplementaryCodePoint(c)) > ? ? ? ? ? ? ? ?n++; > ? ? ? ? ? ?else > ? ? ? ? ? ? ? ?throw new IllegalArgumentException(Integer.toString(c)); > > And this would be faster, as isSupplementaryCodePoint is not optimized for > following isBMPCodePoint: > > ? ? ? ? ? ?if (Character.isBMPCodePoint(c)) > ? ? ? ? ? ? ? ?continue; > ? ? ? ? ? ?if (!Character.isValidCodePoint(c)) > ? ? ? ? ? ? ? ?throw new IllegalArgumentException(Integer.toString(c)); > ? ? ? ? ? ?n++; Done. > Before you go to the meeting, maybe scan the JDK for similar use cases, Sorry, that's your job. > before I get addicted too, and don't forget to define c as final. I see no reason to declare c as final here. > It's enough, that I'm addicted from: > ? ? ? ?// fill backwards for VM performance reasons, reduces register > pressure, faster compare against 0 > ? ? ? ?for (int i = end; n > 0; ) { > ? ? ? ? ? ?int c = codePoints[--i]; > ? ? ? ? ? ?if (Character.isBMPCodePoint(c)) > ? ? ? ? ? ? ? ?v[--n] = (char)c; > ? ? ? ? ? ?else > ? ? ? ? ? ? ? ?Character.toSurrogates(c, v, n-=2); > ? ? ? ?} Do you have actual evidence that this is faster? I can see a different reason why - ???????????? traversal is more cache-friendly. http://en.wikipedia.org/wiki/Boustrophedon Maybe those ancient Greeks were on to something. Martin From martinrb at google.com Thu Mar 25 23:55:06 2010 From: martinrb at google.com (Martin Buchholz) Date: Thu, 25 Mar 2010 16:55:06 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BABCAC0.4010704@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA56749.8020506@gmx.de> <1ccfd1c11003240124v24db88c0wd8c05396a92a6fef@mail.gmail.com> <4BABCAC0.4010704@gmx.de> Message-ID: <1ccfd1c11003251655p3588c58cl48c598d044f11d00@mail.gmail.com> On Thu, Mar 25, 2010 at 13:42, Ulf Zibis wrote: > Am 24.03.2010 09:24, schrieb Martin Buchholz: >> >> Ulf, Sherman, Masayoshi, >> here are changes for you to review. >> Only the patch highSurrogate needs a separate bug filed >> (and CCC, please) >> > > I had just filed it 2 weeks ago, see: > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6933322 Thank you very much. Webrev adjusted. From Ulf.Zibis at gmx.de Thu Mar 25 23:55:57 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Fri, 26 Mar 2010 00:55:57 +0100 Subject: Character.ulf-opto In-Reply-To: <4BAB9B0A.7030207@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA56749.8020506@gmx.de> <1ccfd1c11003240124v24db88c0wd8c05396a92a6fef@mail.gmail.com> <4BAB9B0A.7030207@gmx.de> Message-ID: <4BABF80D.7050105@gmx.de> Am 25.03.2010 18:19, schrieb Ulf Zibis: > Am 24.03.2010 09:24, schrieb Martin Buchholz: > >> Very minor optimizations. Barely worth doing. >> Note my removal of the need to have n++ inside the loop. > > Overseen. Shame on me, as that's true Ulf-style. Yes, reduces > in/decrements on rare supplementary cases. > >> imported patch ulf-opto >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ulf-opto You didn't add my throws comments to offsetByCodePointsImpl and codePointCountImpl. Why? -Ulf From martinrb at google.com Thu Mar 25 23:59:05 2010 From: martinrb at google.com (Martin Buchholz) Date: Thu, 25 Mar 2010 16:59:05 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BAB8CE4.8020804@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003240032y77a6b77fi73b39ea698673860@mail.gmail.com> <4BAB8CE4.8020804@gmx.de> Message-ID: <1ccfd1c11003251659y48b1f0efk2de9271bedc991bc@mail.gmail.com> On Thu, Mar 25, 2010 at 09:18, Ulf Zibis wrote: > Am 24.03.2010 08:32, schrieb Martin Buchholz: >> >> Hi Ulf, >> >> You have this interesting optimization: >> >> ? ? ?public static boolean isSurrogate(char ch) { >> - ? ? ? ?return ch>= MIN_SURROGATE&& ?ch< ?MAX_SURROGATE + 1; >> + ? ? ? ?return (ch -= MIN_SURROGATE)>= 0&& ?ch< ?MAX_SURROGATE + 1 - >> MIN_SURROGATE; >> ? ? ?} >> >> Do you have any evidence that hotspot can produce better code from this, >> or that there is a measurable performance improvement? >> Or was this just an experiment? >> > > If isHighSurrogate and isSurrogate are used consecutive on same char, result > of ch -= MIN_SURROGATE could be used for both. > If isLowSurrogate and isSurrogate are used consecutive on same char, result > of ch -= MAX_SURROGATE would fit better. > If isHighSurrogate and isLowSurrogate are used consecutive on same char, > result of ch -= MIN_LOW_SURROGATE would fit better. It seems to me that you get the same opportunities for constant-folding. Are you suggesting that there are x86 instructions that could be more efficient if they have an argument value of MAX_SURROGATE-MIN_SURROGATE than if they had an argument value of MAX_SURROGATE? Martin > I suggest using 1st pair in JDK library. > > -Ulf > > > From martinrb at google.com Fri Mar 26 00:06:27 2010 From: martinrb at google.com (Martin Buchholz) Date: Thu, 25 Mar 2010 17:06:27 -0700 Subject: String.lastIndexOf confused by unpaired trailing surrogate In-Reply-To: <4BAB9B0A.7030207@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA56749.8020506@gmx.de> <1ccfd1c11003240124v24db88c0wd8c05396a92a6fef@mail.gmail.com> <4BAB9B0A.7030207@gmx.de> Message-ID: <1ccfd1c11003251706o247368a2n281077554ae73a05@mail.gmail.com> On Thu, Mar 25, 2010 at 10:19, Ulf Zibis wrote: > Am 24.03.2010 09:24, schrieb Martin Buchholz: >> Addition of highSurrogate and lowSurrogate >> imported patch highSurrogate >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/highSurrogate >> > > Looks good. Interesting workaround on my "Note:" > I've reckoned with dropping my highSurrogate(char highCPWord, char > lowCPWord). Yeah, it's not the kind of method that tends to become a public API. If you can demonstrate a real performance advantage for highSurrogate(char,char) beyond just EUC_TW, esp in UTF_8, then we can put it into Surrogate.java. Martin > Anyway I like to note, that I use that shortcut in my EUC_TW$Decoder > twiddling. Following code: > > ? ? ? ? ? ?da[dp] = Character.highSurrogate(0x20000 + c); > results in (19 bytes): > ?0x00b8ae27: add ? ?$0x20000,%ecx ? ? ?;*iadd > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - > sun.nio.cs.ext.D_21_d_narrow::decode at 98 (line 196) > ?0x00b8ae2d: mov ? ?%ecx,%ebp > ?0x00b8ae2f: shr ? ?$0xa,%ebp > ?0x00b8ae32: add ? ?$0xd7c0,%ebp ? ? ? ;*isub > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - > java.lang.Character::highSurrogate at 9 (line 3343) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - > sun.nio.cs.ext.D_21_d_narrow::decode at 99 (line 196) > > ? ? ? ? ? ?da[dp] = Character.highSurrogate((char)0x2, c); > results in (9 bytes): > ?0x00b899e7: shr ? ?$0xa,%ebp > ?0x00b899ea: add ? ?$0xd840,%ebp ? ? ? ;*isub > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - > java.lang.Character::highSurrogate at 14 (line 3365) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - > sun.nio.cs.ext.D_22_d_n_fastSurrogate::decode at 97 (line 196) > > > ? ? ? ? ? ?dst.putInt(Character.highSurrogate((char)0x2, c)) << 16 | > Character.lowSurrogate(c)); > would additionally increase performance. I'm still preparing the benchmark + > disassembly. > > Those twiddling could be used in all surrogate processing charset coders, > e.g. maybe true for UTF_x. > If public, would be too useful for developers coding charset coders for > exotic charsets via java.nio.charset.spi.CharsetProvider > > -Ulf > > > From martinrb at google.com Fri Mar 26 00:55:15 2010 From: martinrb at google.com (Martin Buchholz) Date: Thu, 25 Mar 2010 17:55:15 -0700 Subject: Character.ulf-opto In-Reply-To: <4BABF80D.7050105@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA56749.8020506@gmx.de> <1ccfd1c11003240124v24db88c0wd8c05396a92a6fef@mail.gmail.com> <4BAB9B0A.7030207@gmx.de> <4BABF80D.7050105@gmx.de> Message-ID: <1ccfd1c11003251755l493fbfcdi5c7d2195db607b@mail.gmail.com> On Thu, Mar 25, 2010 at 16:55, Ulf Zibis wrote: > Am 25.03.2010 18:19, schrieb Ulf Zibis: >> >> Am 24.03.2010 09:24, schrieb Martin Buchholz: >> >>> Very minor optimizations. ?Barely worth doing. >>> Note my removal of the need to have n++ inside the loop. >> >> Overseen. Shame on me, as that's true Ulf-style. Yes, reduces >> in/decrements on rare supplementary cases. Actually, it optimizes for BMP characters, doesn't it? >>> imported patch ulf-opto >>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ulf-opto > > You didn't add my throws comments to offsetByCodePointsImpl and > codePointCountImpl. Why? codePointCountImpl will never throw the way it is called now, I think. offsetByCodePointsImpl throws explicitly, so a comment is not worthwhile. Martin From martinrb at google.com Fri Mar 26 01:04:57 2010 From: martinrb at google.com (Martin Buchholz) Date: Thu, 25 Mar 2010 18:04:57 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BAB8095.8030903@gmx.de> References: <4A95079A.8080803@gmx.de> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> <4BA60A40.9050600@gmx.de> <1ccfd1c11003210916rad35d31wb0501f9bf960b07@mail.gmail.com> <4BA78011.2070504@gmx.de> <1ccfd1c11003221503r46e6bb78g241e2b07ff7f1b3c@mail.gmail.com> <4BAB8095.8030903@gmx.de> Message-ID: <1ccfd1c11003251804m6c651cdapeb1cdf16487fcbc4@mail.gmail.com> On Thu, Mar 25, 2010 at 08:26, Ulf Zibis wrote: > Am 22.03.2010 23:03, schrieb Martin Buchholz: >> >> On Mon, Mar 22, 2010 at 07:34, Ulf Zibis ?wrote: >> >>> >>> Am 21.03.2010 17:16, schrieb Martin Buchholz: >>> >> >> >>>> >>>> There is a debate about whether to reuse existing exception classes >>>> or to throw class-specific subclasses. ?IMO, IOOBE is a sufficiently >>>> expressive >>>> exception that I might have used just that, with expressive detail >>>> messages. >>>> >>>> >>> >>> I'm with you. Especially StringIndexOutOfBoundsException appears as >>> superfluous sugar to me. But we have it in the docs, so there is no way >>> to >>> get rid of it. >>> What do you think about to refactor most IOOBEs in String related classes >>> to >>> SIOOBEs? It would stay compatible to old Software, which still catches >>> IOOBEs, but would look more straight, tidy and clean and fix the below >>> mentioned bug. >>> >> >> Every change is an incompatible change, with a risk/benefit tradeoff. >> >> IMO there is no change to the exceptions thrown, or declared to be thrown, >> or to their detail messages, in the string classes that is worth the risk >> of incompatible change. >> > > Is somewhat reasonable, but what's the win of those "creative" variations on > exception messages _and_ types in AbstractStringBuilder? : > throw new StringIndexOutOfBoundsException(); > throw new StringIndexOutOfBoundsException(index); > throw new StringIndexOutOfBoundsException(start); > throw new StringIndexOutOfBoundsException("start > length()"); > throw new StringIndexOutOfBoundsException("start > end"); > throw new StringIndexOutOfBoundsException(end - start); > throw new StringIndexOutOfBoundsException(srcEnd); > throw new StringIndexOutOfBoundsException("srcBegin > srcEnd"); > throw new IndexOutOfBoundsException(); > throw new IndexOutOfBoundsException("start " + start + ", end " + end + ", > s.length() " + s.length()); > throw new IndexOutOfBoundsException("dstOffset "+dstOffset); It's a sad situation. It's certain someone is stupid enough to have written a program that depends on the details above, and the question is whether the improvement is worthwhile. A measurable performance improvement will make your case much stronger. >> (with the exception of when the implementation contradicts the spec, >> which is worth fixing) >> > > #insert(int, char[], in, int), uses System.arraycopy(). > If capacity doesn't suffice, it would throw an IOOBE, not SIOOBE > > #insert(int, CharSequence) states: > ? ? * @throws ? ? IndexOutOfBoundsException ?if the offset is invalid. > but (1) in fact throws SIOOBE in described case, if CharSequence is of > String. > and (2) additionally throws IOOBE in case of capacity overflow, which is not > mentioned. > > #insert(...) methods mix between (int index, ...) and (int dstIndex, ...) > without any reason. > > #substring(int) could be faster not using substring(int, int) detailed > bounds checking. > > #subSequence(int, int) in fact throws SIOOBE instead IOOBE. > > #appendCodePoint(int) could throw AIOOBE, similar to many other append > methods, capacity overflow behaviour is not documented. > > I stop here ... ;-) Several of us have been here, wanting to improve these minor blemishes, and eventually deciding to put our effort elsewhere. I am in favor of at least fixing the detail messages and making the argument names more regular, but I you'll have to get the support of others as well. Sherman? Martin From Ulf.Zibis at gmx.de Fri Mar 26 17:56:06 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Fri, 26 Mar 2010 18:56:06 +0100 Subject: Minor improvements to Character.UnicodeBlock In-Reply-To: <1ccfd1c11003251637u6716d8efq5f8e4846ca07093@mail.gmail.com> References: <1ccfd1c11003251637u6716d8efq5f8e4846ca07093@mail.gmail.com> Message-ID: <4BACF536.4040604@gmx.de> Wow, that's indeed much better, than my simple whitespace correction. Looks good. I guess the doc corrections are still in process. My old patch may help to locate them: https://bugs.openjdk.java.net/attachment.cgi?id=146 -Ulf Am 26.03.2010 00:37, schrieb Martin Buchholz: > Hi Masayoshi and Ulf, > > I'd like you to do a code review. > > There are actual doc bugs in the specification > of the surrogate unicode blocks. > > Ulf convinced me that we should use U+ notation for > Unicode blocks. > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/UnicodeBlock/ > > Martin > > > From Ulf.Zibis at gmx.de Fri Mar 26 18:08:05 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Fri, 26 Mar 2010 19:08:05 +0100 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <1ccfd1c11003251633j3f735662m23dde18b8973bb9@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> <4BA8B285.1040403@gmx.de> <1ccfd1c11003231559x25aef975hca5b81e9dfe9b09c@mail.gmail.com> <4BAA49D1.8010702@gmx.de> <1ccfd1c11003241234s5f7c4ec5l9570705d51892567@mail.gmail.com> <4BABC6F2.70604@gmx.de> <1ccfd1c11003251633j3f735662m23dde18b8973bb9@mail.gmail.com> Message-ID: <4BACF805.2030004@gmx.de> Am 26.03.2010 00:33, schrieb Martin Buchholz: > On Thu, Mar 25, 2010 at 13:26, Ulf Zibis wrote: > > >> - several UnicodeBlock declarations differ little in indentation/whitespace >> usage from the average. I would prefer: >> public static final UnicodeBlock SUPPLEMENTARY_PRIVATE_USE_AREA_A = >> new UnicodeBlock("SUPPLEMENTARY_PRIVATE_USE_AREA_A", >> new String[] { "Supplementary Private Use >> Area-A", >> "SupplementaryPrivateUseArea-A" >> }); >> > See forthcoming patch. > > > I've forgotten to mention: http://java.sun.com/docs/codeconv/html/CodeConventions.doc2.html#1852 Applies to location of static final int SIZE = 16; -Ulf From Ulf.Zibis at gmx.de Fri Mar 26 18:36:20 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Fri, 26 Mar 2010 19:36:20 +0100 Subject: Purge Surrogate usages In-Reply-To: <1ccfd1c11003231650g48dc0fb8gd445c46699433377@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003231650g48dc0fb8gd445c46699433377@mail.gmail.com> Message-ID: <4BACFEA4.2040502@gmx.de> Am 24.03.2010 00:50, schrieb Martin Buchholz: > I've added another mini-patch to my patch set. > > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint3 > > This deletes Surrogate.java, as Ulf wants, > except that ... it's another variant of Surrogate.java! > (which I didn't know existed) > > Uses of Surrogate.neededFor are all now changed to > Character.isSupplementaryCodePoint, as suggested by Ulf. > > I intend to fold all of the isBMPCodePoint patches together into one > before I commit them. > > Ulf, please review. > Looking at my old patch: https://bugs.openjdk.java.net/attachment.cgi?id=148&action=diff, I'm afraid, that there are some remaining references to Surrogate class in the code base: - cold imports - static final constants Can you declude them in your patch? -Ulf From martinrb at google.com Fri Mar 26 23:18:13 2010 From: martinrb at google.com (Martin Buchholz) Date: Fri, 26 Mar 2010 16:18:13 -0700 Subject: Purge Surrogate usages In-Reply-To: <4BACFEA4.2040502@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003231650g48dc0fb8gd445c46699433377@mail.gmail.com> <4BACFEA4.2040502@gmx.de> Message-ID: <1ccfd1c11003261618u20b631d3x5bc08de22d82a9f2@mail.gmail.com> On Fri, Mar 26, 2010 at 11:36, Ulf Zibis wrote: > Am 24.03.2010 00:50, schrieb Martin Buchholz: >> >> I've added another mini-patch to my patch set. >> >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint3 >> >> This deletes Surrogate.java, as Ulf wants, >> except that ... it's another variant of Surrogate.java! >> (which I didn't know existed) >> >> Uses of Surrogate.neededFor are all now changed to >> Character.isSupplementaryCodePoint, as suggested by Ulf. >> >> I intend to fold all of the isBMPCodePoint patches together into one >> before I commit them. >> >> Ulf, please review. >> > > Looking at my old patch: > https://bugs.openjdk.java.net/attachment.cgi?id=148&action=diff, > I'm afraid, that there are some remaining references to Surrogate class in > the code base: > - cold imports > - static final constants > Can you declude them in your patch? OK, just to make you happy, more Surrogate cleansing. Two more mini-patches for you to review: To be qfolded into public-isBMPCodePoint http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint4 to be qfolded into highSurrogate http://cr.openjdk.java.net/~martin/webrevs/openjdk7/highSurrogate2 Martin From kelly.ohair at sun.com Sat Mar 27 05:44:16 2010 From: kelly.ohair at sun.com (kelly.ohair at sun.com) Date: Sat, 27 Mar 2010 05:44:16 +0000 Subject: hg: jdk7/tl/langtools: 6938326: Use of "ant -diagnostics" a problem with ant 1.8.0, exit code 1 now Message-ID: <20100327054418.366324454D@hg.openjdk.java.net> Changeset: de6375751eb7 Author: ohair Date: 2010-03-26 22:37 -0700 URL: http://hg.openjdk.java.net/jdk7/tl/langtools/rev/de6375751eb7 6938326: Use of "ant -diagnostics" a problem with ant 1.8.0, exit code 1 now Reviewed-by: jjg ! make/Makefile From Ulf.Zibis at gmx.de Sat Mar 27 09:48:47 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sat, 27 Mar 2010 10:48:47 +0100 Subject: Purge Surrogate usages In-Reply-To: <1ccfd1c11003261618u20b631d3x5bc08de22d82a9f2@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003231650g48dc0fb8gd445c46699433377@mail.gmail.com> <4BACFEA4.2040502@gmx.de> <1ccfd1c11003261618u20b631d3x5bc08de22d82a9f2@mail.gmail.com> Message-ID: <4BADD47F.7090502@gmx.de> Am 27.03.2010 00:18, schrieb Martin Buchholz: > On Fri, Mar 26, 2010 at 11:36, Ulf Zibis wrote: > >> >> Looking at my old patch: >> https://bugs.openjdk.java.net/attachment.cgi?id=148&action=diff, >> I'm afraid, that there are some remaining references to Surrogate class in >> the code base: >> - cold imports >> - static final constants >> Can you declude them in your patch? >> > > OK, just to make you happy, more Surrogate cleansing. > Two more mini-patches for you to review: > > To be qfolded into public-isBMPCodePoint > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint4 > Looks good. Maybe you could rename isBMPCodePoint* to public-isBMPCodePoint* I often mix, that isBMPCodePoint seems to be precedent of isBMPCodePoint* > to be qfolded into highSurrogate > http://cr.openjdk.java.net/~martin/webrevs/openjdk7/highSurrogate2 > You additionally could add: * Use of {@link Character#high/lowSurrogate} is generally preferred. and propagate those methods to Character class. -Ulf From martinrb at google.com Sat Mar 27 15:51:45 2010 From: martinrb at google.com (Martin Buchholz) Date: Sat, 27 Mar 2010 08:51:45 -0700 Subject: Purge Surrogate usages In-Reply-To: <4BADD47F.7090502@gmx.de> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003231650g48dc0fb8gd445c46699433377@mail.gmail.com> <4BACFEA4.2040502@gmx.de> <1ccfd1c11003261618u20b631d3x5bc08de22d82a9f2@mail.gmail.com> <4BADD47F.7090502@gmx.de> Message-ID: <1ccfd1c11003270851k24dad2b6h73ce528d84348999@mail.gmail.com> On Sat, Mar 27, 2010 at 02:48, Ulf Zibis wrote: > Am 27.03.2010 00:18, schrieb Martin Buchholz: > You additionally could add: > > ? ? * Use of {@link Character#high/lowSurrogate} is generally preferred. > > and propagate those methods to Character class. Thanks. Done. (I think "delegate" expresses your intent better than "propagate") Martin From Ulf.Zibis at gmx.de Sat Mar 27 21:08:07 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sat, 27 Mar 2010 22:08:07 +0100 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <1ccfd1c11003221527q29f61f7u700344a99d293ceb@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B9FE4DD.1090405@sun.com> <1ccfd1c11003161409u923d21ya30acd8b104ee9ac@mail.gmail.com> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> <1ccfd1c11003211239h2105e5f1m903dd5d3fbf5387b@mail.gmail.com> <4BA78CE8.9020107@gmx.de> <1ccfd1c11003221527q29f61f7u700344a99d293ceb@mail.gmail.com> Message-ID: <4BAE73B7.40101@gmx.de> Am 22.03.2010 23:27, schrieb Martin Buchholz: > Ulf, > > I'd like to start a mq patch containing changes to > the String exception handling in the string classes. > Please provide me with a patch that uses the > blessed conventional names from Preconditions.java. > Here are my first patches for start. In the 2nd patch I did additional speed-ups, corrections and renamings. Please review. -Ulf > For the version that checks an offset and length for > containment within a larger sequence, I would prefer > the name "checkSubsequence", for example > > private static void checkSubsequence(int start, int len, int size) > > Please make sure that there are sufficient tests in > test/java/lang/String to ensure that you are not > inadvertently making changes to the exceptions thrown. > > I note that test/java/lang/String/{Exceptions,Supplementary} > do try to test exception handling, but do not appear to > test for the *exact* class of the exception thrown, > nor the detail message of the exception. > When those tests were written, compatibility was less important. > > Please adapt my > test/java/util/ArrayList/RangeCheckMicroBenchmark.java > to test string classes instead. > There is a good chance that you can demonstrate > a performance improvement on ordinary String operations! > > Thanks, > > Martin > > > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: String_Preconditions URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: String_Preconditions2 URL: From Ulf.Zibis at gmx.de Sat Mar 27 22:15:44 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sat, 27 Mar 2010 23:15:44 +0100 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <1ccfd1c11003251633j3f735662m23dde18b8973bb9@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4B99373E.40502@gmx.de> <1ccfd1c11003111138n3c666e91q60079121176ddd@mail.gmail.com> <4B995D22.2020507@gmx.de> <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> <4BA8B285.1040403@gmx.de> <1ccfd1c11003231559x25aef975hca5b81e9dfe9b09c@mail.gmail.com> <4BAA49D1.8010702@gmx.de> <1ccfd1c11003241234s5f7c4ec5l9570705d51892567@mail.gmail.com> <4BABC6F2.70604@gmx.de> <1ccfd1c11003251633j3f735662m23dde18b8973bb9@mail.gmail.com> Message-ID: <4BAE838F.1040301@gmx.de> Am 26.03.2010 00:33, schrieb Martin Buchholz: > On Thu, Mar 25, 2010 at 13:26, Ulf Zibis wrote: > >> Am 24.03.2010 20:34, schrieb Martin Buchholz: >> >>> On Wed, Mar 24, 2010 at 10:20, Ulf Zibis wrote: >>> >>> >>>> I too would like to see 8 spaces indentation on line breaks like: >>>> if (aaaaaaaaaaaaaaa> bbbbbbbbbbbbb&& >>>> ccccccccccccccc> ddddddddddddddddd) >>>> doSomething(); >>>> >>>> >>> This appears to be a new style (perhaps coming from the java IDEs?) >>> >>> >> This rule is much older: >> http://java.sun.com/docs/codeconv/html/CodeConventions.doc3.html#248 >> But yes, I first saw this from NetBeans IDE formatting facility. >> > Ahhh, thank you very much for this history lesson. > > I have manually adjusted some source files as you requested, > but systematically fixing this particular coding style bug > is likely to be difficult. > NetBeans IDE does a good job on that. Also those other formatting tasks maybe good addressed there. -Ulf From Ulf.Zibis at gmx.de Sat Mar 27 22:37:57 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sat, 27 Mar 2010 23:37:57 +0100 Subject: Character.ulf-opto In-Reply-To: <1ccfd1c11003251755l493fbfcdi5c7d2195db607b@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA56749.8020506@gmx.de> <1ccfd1c11003240124v24db88c0wd8c05396a92a6fef@mail.gmail.com> <4BAB9B0A.7030207@gmx.de> <4BABF80D.7050105@gmx.de> <1ccfd1c11003251755l493fbfcdi5c7d2195db607b@mail.gmail.com> Message-ID: <4BAE88C5.2040205@gmx.de> Am 26.03.2010 01:55, schrieb Martin Buchholz: > On Thu, Mar 25, 2010 at 16:55, Ulf Zibis wrote: > >> Am 25.03.2010 18:19, schrieb Ulf Zibis: >> >>> Am 24.03.2010 09:24, schrieb Martin Buchholz: >>> >>> >>>> Very minor optimizations. Barely worth doing. >>>> Note my removal of the need to have n++ inside the loop. >>>> >>> Overseen. Shame on me, as that's true Ulf-style. Yes, reduces >>> in/decrements on rare supplementary cases. >>> > Actually, it optimizes for BMP characters, doesn't it? > Yes, of course. > >>>> imported patch ulf-opto >>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ulf-opto >>>> >> You didn't add my throws comments to offsetByCodePointsImpl and >> codePointCountImpl. Why? >> > codePointCountImpl will never throw the way it is called now, I think. > Seems, you are right. > offsetByCodePointsImpl throws explicitly, so a comment is not worthwhile. > Well, those comment had been valuable on my research on possible exception doc bugs. -Ulf From Ulf.Zibis at gmx.de Sat Mar 27 22:43:27 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Sat, 27 Mar 2010 23:43:27 +0100 Subject: Purge Surrogate usages In-Reply-To: <1ccfd1c11003270851k24dad2b6h73ce528d84348999@mail.gmail.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA92673.3030200@sun.com> <1ccfd1c11003221536k7fa58a39jbbf4dfcfa34b01c0@mail.gmail.com> <4BA8E844.7080901@gmx.de> <4BA90188.3090902@gmx.de> <1ccfd1c11003231650g48dc0fb8gd445c46699433377@mail.gmail.com> <4BACFEA4.2040502@gmx.de> <1ccfd1c11003261618u20b631d3x5bc08de22d82a9f2@mail.gmail.com> <4BADD47F.7090502@gmx.de> <1ccfd1c11003270851k24dad2b6h73ce528d84348999@mail.gmail.com> Message-ID: <4BAE8A0F.8090602@gmx.de> Am 27.03.2010 16:51, schrieb Martin Buchholz: > On Sat, Mar 27, 2010 at 02:48, Ulf Zibis wrote: > > >> You additionally could add: >> >> * Use of {@link Character#high/lowSurrogate} is generally preferred. >> >> and propagate those methods to Character class. >> > Thanks. Done. > > (I think "delegate" expresses your intent better than "propagate") Thanks for your help in wording. -Ulf From kevin.l.stern at gmail.com Sun Mar 28 11:55:12 2010 From: kevin.l.stern at gmail.com (Kevin L. Stern) Date: Sun, 28 Mar 2010 06:55:12 -0500 Subject: A List implementation backed by multiple small arrays rather than the traditional single large array. Message-ID: <1704b7a21003280455u784d4d2ape39a47e2367b79a8@mail.gmail.com> I put together the following class, ChunkedArrayList, in response to Martin's request (excerpted from an earlier conversation on this web board) below. https://docs.google.com/leaf?id=0B6brz3MPBDdhMGNiNGIwMTQtMTgxMi00ODlmLTk4ZGYtOWY2NDE0M2E5M2Zl&sort=name&layout=list&num=50 Thoughts? Regards, Kevin On Tue, Mar 9, 2010 at 3:15 PM, Martin Buchholz wrote: It surely is not a good idea to use a single backing array for huge arrays. As you point out, it's up to 32GB for just one object. But the core JDK doesn't offer a suitable alternative for users who need very large collections. It would have been more in the spirit of Java to have a collection class instead of ArrayList that was not fastest at any particular operation, but had excellent asymptotic behaviour, based on backing arrays containing backing arrays. But: - no such excellent class has been written yet (or please point me to such a class) - even if it were, such a best-of-breed-general-purpose List implementation would probably need to be introduced as a separate class, because of the performance expectations of existing implementations. In the meantime, we have to maintain what we got, and that includes living with arrays and classes that wrap them. Changing the spec is unlikely to succeed.. Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.l.stern at gmail.com Sun Mar 28 12:28:22 2010 From: kevin.l.stern at gmail.com (Kevin L. Stern) Date: Sun, 28 Mar 2010 07:28:22 -0500 Subject: A List implementation backed by multiple small arrays rather than the traditional single large array. In-Reply-To: <1704b7a21003280455u784d4d2ape39a47e2367b79a8@mail.gmail.com> References: <1704b7a21003280455u784d4d2ape39a47e2367b79a8@mail.gmail.com> Message-ID: <1704b7a21003280528s64fe6f32hef68b45865bd3223@mail.gmail.com> Please ignore the lack of custom serialization, I'll certainly tidy up the code if there is interest in it. On Sun, Mar 28, 2010 at 6:55 AM, Kevin L. Stern wrote: > I put together the following class, ChunkedArrayList, in response to > Martin's request (excerpted from an earlier conversation on this web board) > below. > > > https://docs.google.com/leaf?id=0B6brz3MPBDdhMGNiNGIwMTQtMTgxMi00ODlmLTk4ZGYtOWY2NDE0M2E5M2Zl&sort=name&layout=list&num=50 > > Thoughts? > > Regards, > > Kevin > > > On Tue, Mar 9, 2010 at 3:15 PM, Martin Buchholz > wrote: > > It surely is not a good idea to use a single backing array > for huge arrays. As you point out, it's up to 32GB > for just one object. But the core JDK > doesn't offer a suitable alternative for users who need very > large collections. > > It would have been more in the spirit of Java to have a > collection class instead of ArrayList that was not fastest at > any particular operation, but had excellent asymptotic behaviour, > based on backing arrays containing backing arrays. > But: > - no such excellent class has been written yet > (or please point me to such a class) > - even if it were, such a best-of-breed-general-purpose > List implementation would probably need to be introduced as a > separate class, because of the performance expectations of > existing implementations. > > In the meantime, we have to maintain what we got, > and that includes living with arrays and classes that wrap them. > > Changing the spec is unlikely to succeed.. > > Martin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.l.stern at gmail.com Sun Mar 28 14:19:02 2010 From: kevin.l.stern at gmail.com (Kevin L. Stern) Date: Sun, 28 Mar 2010 09:19:02 -0500 Subject: A List implementation backed by multiple small arrays rather than the traditional single large array. In-Reply-To: <1704b7a21003280528s64fe6f32hef68b45865bd3223@mail.gmail.com> References: <1704b7a21003280455u784d4d2ape39a47e2367b79a8@mail.gmail.com> <1704b7a21003280528s64fe6f32hef68b45865bd3223@mail.gmail.com> Message-ID: <1704b7a21003280719h60a570b9te5b1296516f2ea29@mail.gmail.com> Apologies, please use this link instead; this way you do not need to download the file (it displays as a document). https://docs.google.com/Doc?docid=0Aabrz3MPBDdhZGdrbnEzejdfM2M3am5wM2Mz&hl=en Regards, Kevin On Sun, Mar 28, 2010 at 7:28 AM, Kevin L. Stern wrote: > Please ignore the lack of custom serialization, I'll certainly tidy up the > code if there is interest in it. > > > On Sun, Mar 28, 2010 at 6:55 AM, Kevin L. Stern wrote: > >> I put together the following class, ChunkedArrayList, in response to >> Martin's request (excerpted from an earlier conversation on this web board) >> below. >> >> >> https://docs.google.com/leaf?id=0B6brz3MPBDdhMGNiNGIwMTQtMTgxMi00ODlmLTk4ZGYtOWY2NDE0M2E5M2Zl&sort=name&layout=list&num=50 >> >> Thoughts? >> >> Regards, >> >> Kevin >> >> >> On Tue, Mar 9, 2010 at 3:15 PM, Martin Buchholz >> wrote: >> >> It surely is not a good idea to use a single backing array >> for huge arrays. As you point out, it's up to 32GB >> for just one object. But the core JDK >> doesn't offer a suitable alternative for users who need very >> large collections. >> >> It would have been more in the spirit of Java to have a >> collection class instead of ArrayList that was not fastest at >> any particular operation, but had excellent asymptotic behaviour, >> based on backing arrays containing backing arrays. >> But: >> - no such excellent class has been written yet >> (or please point me to such a class) >> - even if it were, such a best-of-breed-general-purpose >> List implementation would probably need to be introduced as a >> separate class, because of the performance expectations of >> existing implementations. >> >> In the meantime, we have to maintain what we got, >> and that includes living with arrays and classes that wrap them. >> >> Changing the spec is unlikely to succeed.. >> >> Martin >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xuelei.fan at sun.com Mon Mar 29 05:51:14 2010 From: xuelei.fan at sun.com (xuelei.fan at sun.com) Date: Mon, 29 Mar 2010 05:51:14 +0000 Subject: hg: jdk7/tl/jdk: 6693917: regression tests need to update for supporting ECC on solaris 11 Message-ID: <20100329055200.65A9D44814@hg.openjdk.java.net> Changeset: 31517a0345d1 Author: xuelei Date: 2010-03-29 13:27 +0800 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/31517a0345d1 6693917: regression tests need to update for supporting ECC on solaris 11 Reviewed-by: weijun ! test/sun/security/ssl/etc/keystore ! test/sun/security/ssl/etc/truststore ! test/sun/security/ssl/sanity/ciphersuites/CheckCipherSuites.java From martinrb at google.com Mon Mar 29 07:23:35 2010 From: martinrb at google.com (Martin Buchholz) Date: Mon, 29 Mar 2010 00:23:35 -0700 Subject: A List implementation backed by multiple small arrays rather than the traditional single large array. In-Reply-To: <1704b7a21003280455u784d4d2ape39a47e2367b79a8@mail.gmail.com> References: <1704b7a21003280455u784d4d2ape39a47e2367b79a8@mail.gmail.com> Message-ID: <1ccfd1c11003290023u5c59f926o8ceb79fe0d3bbc6f@mail.gmail.com> On Sun, Mar 28, 2010 at 04:55, Kevin L. Stern wrote: > I put together the following class, ChunkedArrayList, in response to > Martin's request (excerpted from an earlier conversation on this web board) > below. > > https://docs.google.com/leaf?id=0B6brz3MPBDdhMGNiNGIwMTQtMTgxMi00ODlmLTk4ZGYtOWY2NDE0M2E5M2Zl&sort=name&layout=list&num=50 > > Thoughts? This class is well on the way to what I was thinking of, but my bar for acceptance is a little higher. In particular, I don't want to add yet another class that is can replace some, but not all of existing list implementations. Most obviously, I don't want to lose the ability, introduced in ArrayDeque, of having O(1) insertion at the front and end of the collection. Perhaps you can do this by having one "arraylet" always be shared by both ends, which grow towards each other in circular fashion. I also think we should shrink the array when necessary, so that occupancy never drops below, say 50%. Perhaps we should also have amortized O(1) insertion in the middle by using a "gap array". Probably more important for byte/char collections like StringBuilder... I believe there are more complicated implementations that permit O(1) insertions at the ends, and only O(sqrt(N)) space overhead. .... E.g. Use your favorite search engine to do some research on: Resizable arrays in optimal time and space Succinct dynamic data structures Meta-comment: there is not enough transfer of academic research results into practice; I would think this is one of the responsibilities of the researchers. I presume you'd be willing to sign a contributor agreement to get your changes into the JDK someday. Martin > Regards, > > Kevin > > > On Tue, Mar 9, 2010 at 3:15 PM, Martin Buchholz > wrote: > > ??? It surely is not a good idea to use a single backing array > ??? for huge arrays.? As you point out, it's up to 32GB > ??? for just one object.? But the core JDK > ??? doesn't offer a suitable alternative for users who need very > ??? large collections. > > ??? It would have been more in the spirit of Java to have a > ??? collection class instead of ArrayList that was not fastest at > ??? any particular operation, but had excellent asymptotic behaviour, > ??? based on backing arrays containing backing arrays. > ??? But: > ??? - no such excellent class has been written yet > ???? (or please point me to such a class) > ??? - even if it were, such a best-of-breed-general-purpose > ???? List implementation would probably need to be introduced as a > ???? separate class, because of the performance expectations of > ???? existing implementations. > > ??? In the meantime, we have to maintain what we got, > ??? and that includes living with arrays and classes that wrap them. > > ??? Changing the spec is unlikely to succeed.. > > ??? Martin > From opinali at gmail.com Mon Mar 29 15:08:36 2010 From: opinali at gmail.com (Osvaldo Doederlein) Date: Mon, 29 Mar 2010 12:08:36 -0300 Subject: A List implementation backed by multiple small arrays rather than the traditional single large array. In-Reply-To: <1ccfd1c11003290023u5c59f926o8ceb79fe0d3bbc6f@mail.gmail.com> References: <1704b7a21003280455u784d4d2ape39a47e2367b79a8@mail.gmail.com> <1ccfd1c11003290023u5c59f926o8ceb79fe0d3bbc6f@mail.gmail.com> Message-ID: Initially, it would be good enough to replace only java.util.ArrayList with minimal overhead. ArrayList does not support efficient add-at-front or other enhancements of ArrayDeque; but ArrayList is still a much more important and popular collection, it's the primary "straight replacement for primitive arrrays" and I guess it should continue with that role. One problem of both ArrayList and primitive arrays is that they're not GC-friendly; huge arrays suck for GC. IBM's realtime Metronome collector uses the "arraylet" structure for primitive arrays, so there is a hard upper-limit on object size (well, at least as long as apps don't define classes with thousands of fields, I guess). This avoids the whole issue of "large objects" which permits a simpler heap layout, better incremental GC, etc. There are two tradeoffs. First, some overhead for all array operations - but this is the least important, remarkably as the arraylet trick is implement at the VM level so we can rely on the JIT to perform extra optimizations (e.g., unrolling and other loop optimizations; bounds-check elimination and other array opts, may be arraylet-aware so most overhead is cancelled or at least lifted out of loop bodies and hot paths.) Second, no support at all for huge arrays is incompatible with native code that expects a continuous layout, e.g. for the byte[]s inside Images - so all these uses must be identified and fixed somehow, e.g using DirectBuffers, or changing the native layer to understand arraylets (image libaries may be OK with banding), or in the worst case just copy the data to/from a continuous, native array (in most cases I think this copy already happens for other reasons, so there's no extra copy, just a slightly more expensive copy). Now we're talking about some big VM change of course, but HotSpot would not be the first production VM to do this so maybe it's a viable project for the future, remarkably as Sun plans to keep raising the bar in incremental/realtime GC (G1 may already be a great step forward, but huge arrays will always spoil the fun for many apps). In summary I think the ChunkedArrayList would serve only as a stopgap solution, with extremely limited benefits unless it's sufficiently good so like Martin says, we can replace more List implementations. And I'll even add, replace many other collections too - e.g. a giant HashMap will contain a giant Entry[] array inside it, I want this array to be chunked too (ConcurrentHashMap already is, but it's tuned up differently, for concurrent usage - and that's just one example anyway). And by "replace" I further mean "change the implementation of all existing collections that are array-backed", not "offer new collections" as the latter will only be heavily used ten years from today when JavaSE7 is considered the minimum JavaSE release to be supported by apps/libraries/frameworks/containers/etc. Even then, the benefits will be clearly inferior to what can be achireved by VM-level arraylets. A+ Osvaldo 2010/3/29 Martin Buchholz > On Sun, Mar 28, 2010 at 04:55, Kevin L. Stern > wrote: > > I put together the following class, ChunkedArrayList, in response to > > Martin's request (excerpted from an earlier conversation on this web > board) > > below. > > > > > https://docs.google.com/leaf?id=0B6brz3MPBDdhMGNiNGIwMTQtMTgxMi00ODlmLTk4ZGYtOWY2NDE0M2E5M2Zl&sort=name&layout=list&num=50 > > > > Thoughts? > > This class is well on the way to what I was thinking of, > but my bar for acceptance is a little higher. > In particular, I don't want to add yet another class > that is can replace some, but not all of existing > list implementations. > > Most obviously, I don't want to lose the ability, > introduced in ArrayDeque, of having O(1) insertion > at the front and end of the collection. > Perhaps you can do this by having one "arraylet" > always be shared by both ends, which > grow towards each other in circular fashion. > > I also think we should shrink the array when > necessary, so that occupancy never drops > below, say 50%. > > Perhaps we should also have amortized O(1) > insertion in the middle by using a "gap array". > Probably more important for byte/char collections > like StringBuilder... > > I believe there are more complicated implementations > that permit O(1) insertions at the ends, and only > O(sqrt(N)) space overhead. > > .... > > E.g. Use your favorite search engine to do > some research on: > Resizable arrays in optimal time and space > Succinct dynamic data structures > > Meta-comment: there is not enough transfer of > academic research results into practice; I would think this > is one of the responsibilities of the researchers. > > I presume you'd be willing to sign a > contributor agreement to get your changes into > the JDK someday. > > Martin > > > Regards, > > > > Kevin > > > > > > On Tue, Mar 9, 2010 at 3:15 PM, Martin Buchholz > > wrote: > > > > It surely is not a good idea to use a single backing array > > for huge arrays. As you point out, it's up to 32GB > > for just one object. But the core JDK > > doesn't offer a suitable alternative for users who need very > > large collections. > > > > It would have been more in the spirit of Java to have a > > collection class instead of ArrayList that was not fastest at > > any particular operation, but had excellent asymptotic behaviour, > > based on backing arrays containing backing arrays. > > But: > > - no such excellent class has been written yet > > (or please point me to such a class) > > - even if it were, such a best-of-breed-general-purpose > > List implementation would probably need to be introduced as a > > separate class, because of the performance expectations of > > existing implementations. > > > > In the meantime, we have to maintain what we got, > > and that includes living with arrays and classes that wrap them. > > > > Changing the spec is unlikely to succeed.. > > > > Martin > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinrb at google.com Mon Mar 29 21:46:00 2010 From: martinrb at google.com (Martin Buchholz) Date: Mon, 29 Mar 2010 14:46:00 -0700 Subject: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint In-Reply-To: <4BAE838F.1040301@gmx.de> References: <4A95079A.8080803@gmx.de> <4B995D22.2020507@gmx.de> <1ccfd1c11003121504u5761c160t45513c98d3cec816@mail.gmail.com> <4BA8B285.1040403@gmx.de> <1ccfd1c11003231559x25aef975hca5b81e9dfe9b09c@mail.gmail.com> <4BAA49D1.8010702@gmx.de> <1ccfd1c11003241234s5f7c4ec5l9570705d51892567@mail.gmail.com> <4BABC6F2.70604@gmx.de> <1ccfd1c11003251633j3f735662m23dde18b8973bb9@mail.gmail.com> <4BAE838F.1040301@gmx.de> Message-ID: <1ccfd1c11003291446j20a03c5at3becedb35dff707f@mail.gmail.com> On Sat, Mar 27, 2010 at 15:15, Ulf Zibis wrote: > Am 26.03.2010 00:33, schrieb Martin Buchholz: >> >> On Thu, Mar 25, 2010 at 13:26, Ulf Zibis ?wrote: >> >>> >>> Am 24.03.2010 20:34, schrieb Martin Buchholz: >>> >>>> >>>> On Wed, Mar 24, 2010 at 10:20, Ulf Zibis ? ?wrote: >>>> >>>> >>>>> >>>>> I too would like to see 8 spaces indentation on line breaks like: >>>>> ? ?if (aaaaaaaaaaaaaaa> ? ?bbbbbbbbbbbbb&& >>>>> ? ? ? ? ? ?ccccccccccccccc> ? ?ddddddddddddddddd) >>>>> ? ? ? ?doSomething(); >>>>> >>>>> >>>> >>>> This appears to be a new style (perhaps coming from the java IDEs?) >>>> >>>> >>> >>> This rule is much older: >>> http://java.sun.com/docs/codeconv/html/CodeConventions.doc3.html#248 >>> But yes, I first saw this from NetBeans IDE formatting facility. >>> >> >> Ahhh, thank you very much for this history lesson. >> >> I have manually adjusted some source files as you requested, >> but systematically fixing this particular coding style bug >> is likely to be difficult. >> > > NetBeans IDE does a good job on that. Also those other formatting tasks > maybe good addressed there. One of the standard counter-arguments to pervasive code cleaning changes is the difficulty of merging, that other developers run into. The standard counter-counter-argument to *that* is "we provide an automated tool you can run over your own code to eliminate merge conflicts", but that does actually require an automated tool, and IDEs are typically not very scriptable. So I generally greatly prefer changes that can be automated, (typically a perl script in my usage) and the automation tool can be checked in to the repo. Martin From martinrb at google.com Mon Mar 29 22:17:22 2010 From: martinrb at google.com (Martin Buchholz) Date: Mon, 29 Mar 2010 15:17:22 -0700 Subject: review request for 6798511/6860431: Include functionality of Surrogate in Character In-Reply-To: <4BAE73B7.40101@gmx.de> References: <4A95079A.8080803@gmx.de> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> <1ccfd1c11003211239h2105e5f1m903dd5d3fbf5387b@mail.gmail.com> <4BA78CE8.9020107@gmx.de> <1ccfd1c11003221527q29f61f7u700344a99d293ceb@mail.gmail.com> <4BAE73B7.40101@gmx.de> Message-ID: <1ccfd1c11003291517l4a46f260s5c78639244da6420@mail.gmail.com> Hi Ulf, I will sponsor your initiative to refactor the exception handling. Before this can go in, we should have just the exception handling changes contained in one patch, since it is such a big change. I'd like you to try to port my related RangeCheckMicroBenchmark to string handling and hopefully demonstrate some measurable performance improvement. ---- In the code below, I think some if's need to be changed to "else if"s. (but don't just fix it - make sure we have a failing test with your current code (you do run the regression tests religiously, right?)) + static void checkPositionIndexes(int srcLen, int begin, int end) { + assert (srcLen >= 0); + int index; + if (begin < 0) + index = begin; + if (end > srcLen) + index = begin>srcLen ? begin:end-begin; + if (end < begin) + index = begin>srcLen ? begin : end<0 ? end : end-begin; + else + return; + throw new StringIndexOutOfBoundsException(index); ---- it's => its + * following values are referred in it's message: ---- badIndex might be a better name for "index" below. + int index; + if (begin < 0) + index = begin; ---- Run at least the following tests (below is how I test this code myself) /home/martinrb/jct-tools/3.2.2_03/linux/bin/jtreg -v:nopass,fail -vmoption:-enablesystemassertions -automatic "-k:\!ignore" -testjdk:/usr/local/google/home/martin/ws/upstream/build/linux-amd64 test/sun/nio/cs test/java/nio/charset test/java/lang/StringCoding test/java/lang/StringBuilder test/java/lang/StringBuffer test/java/lang/String test/java/lang/Appendable ---- I think returning len below is too confusing. Just make the return type void. + int checkPositionIndex(int index) { + int len = count; // not sure, if JIT recognizes that it's final ? + checkPositionIndex(len, index); + return len; + } ---- We will need a significant merge once I commit related changes. ---- Thanks, Martin On Sat, Mar 27, 2010 at 14:08, Ulf Zibis wrote: > Am 22.03.2010 23:27, schrieb Martin Buchholz: >> >> Ulf, >> >> I'd like to start a mq patch containing changes to >> the String exception handling in the string classes. >> Please provide me with a patch that uses the >> blessed conventional names from Preconditions.java. >> > > Here are my first patches for start. > In the 2nd patch I did additional speed-ups, corrections and renamings. > > Please review. > > -Ulf > > >> For the version that checks an offset and length for >> containment within a larger sequence, I would prefer >> the name "checkSubsequence", for example >> >> private static void checkSubsequence(int start, int len, int size) >> >> Please make sure that there are sufficient tests in >> test/java/lang/String to ensure that you are not >> inadvertently making changes to the exceptions thrown. >> >> I note that test/java/lang/String/{Exceptions,Supplementary} >> do try to test exception handling, but do not appear to >> test for the *exact* class of the exception thrown, >> nor the detail message of the exception. >> When those tests were written, compatibility was less important. >> >> Please adapt my >> test/java/util/ArrayList/RangeCheckMicroBenchmark.java >> to test string classes instead. >> There is a good chance that you can demonstrate >> a performance improvement on ordinary String operations! >> >> Thanks, >> >> Martin >> >> >> > From kevin.l.stern at gmail.com Mon Mar 29 23:24:26 2010 From: kevin.l.stern at gmail.com (Kevin L. Stern) Date: Mon, 29 Mar 2010 18:24:26 -0500 Subject: A List implementation backed by multiple small arrays rather than the traditional single large array. In-Reply-To: References: <1704b7a21003280455u784d4d2ape39a47e2367b79a8@mail.gmail.com> <1ccfd1c11003290023u5c59f926o8ceb79fe0d3bbc6f@mail.gmail.com> Message-ID: <1704b7a21003291624n740dbc8bibbc15e1b8e0291d4@mail.gmail.com> One advantage of this approach over the VM approach is that no data copy is necessary when the capacity of the data structure is expanded (new arrays are tacked on to the end of the top level array of references) or contracted (arrays of null are removed from the top level array of references) aside from the (any?) copy of the top level array of references. One way to address your concern, though, is to create a ChunkedArray class that simply wraps an array and provides expand and contract functionality. This could be reused in any/all collections. On Mon, Mar 29, 2010 at 10:08 AM, Osvaldo Doederlein wrote: > Initially, it would be good enough to replace only java.util.ArrayList with > minimal overhead. ArrayList does not support efficient add-at-front or other > enhancements of ArrayDeque; but ArrayList is still a much more important and > popular collection, it's the primary "straight replacement for primitive > arrrays" and I guess it should continue with that role. > > One problem of both ArrayList and primitive arrays is that they're not > GC-friendly; huge arrays suck for GC. IBM's realtime Metronome collector > uses the "arraylet" structure for primitive arrays, so there is a hard > upper-limit on object size (well, at least as long as apps don't define > classes with thousands of fields, I guess). This avoids the whole issue of > "large objects" which permits a simpler heap layout, better incremental GC, > etc. There are two tradeoffs. First, some overhead for all array operations > - but this is the least important, remarkably as the arraylet trick is > implement at the VM level so we can rely on the JIT to perform extra > optimizations (e.g., unrolling and other loop optimizations; bounds-check > elimination and other array opts, may be arraylet-aware so most overhead is > cancelled or at least lifted out of loop bodies and hot paths.) Second, no > support at all for huge arrays is incompatible with native code that expects > a continuous layout, e.g. for the byte[]s inside Images - so all these uses > must be identified and fixed somehow, e.g using DirectBuffers, or changing > the native layer to understand arraylets (image libaries may be OK with > banding), or in the worst case just copy the data to/from a continuous, > native array (in most cases I think this copy already happens for other > reasons, so there's no extra copy, just a slightly more expensive copy). > > Now we're talking about some big VM change of course, but HotSpot would not > be the first production VM to do this so maybe it's a viable project for the > future, remarkably as Sun plans to keep raising the bar in > incremental/realtime GC (G1 may already be a great step forward, but huge > arrays will always spoil the fun for many apps). > > In summary I think the ChunkedArrayList would serve only as a stopgap > solution, with extremely limited benefits unless it's sufficiently good so > like Martin says, we can replace more List implementations. And I'll even > add, replace many other collections too - e.g. a giant HashMap will contain > a giant Entry[] array inside it, I want this array to be chunked too > (ConcurrentHashMap already is, but it's tuned up differently, for concurrent > usage - and that's just one example anyway). And by "replace" I further mean > "change the implementation of all existing collections that are > array-backed", not "offer new collections" as the latter will only be > heavily used ten years from today when JavaSE7 is considered the minimum > JavaSE release to be supported by apps/libraries/frameworks/containers/etc. > Even then, the benefits will be clearly inferior to what can be achireved by > VM-level arraylets. > > A+ > Osvaldo > > > 2010/3/29 Martin Buchholz > > On Sun, Mar 28, 2010 at 04:55, Kevin L. Stern >> wrote: >> > I put together the following class, ChunkedArrayList, in response to >> > Martin's request (excerpted from an earlier conversation on this web >> board) >> > below. >> > >> > >> https://docs.google.com/leaf?id=0B6brz3MPBDdhMGNiNGIwMTQtMTgxMi00ODlmLTk4ZGYtOWY2NDE0M2E5M2Zl&sort=name&layout=list&num=50 >> > >> > Thoughts? >> >> This class is well on the way to what I was thinking of, >> but my bar for acceptance is a little higher. >> In particular, I don't want to add yet another class >> that is can replace some, but not all of existing >> list implementations. >> >> Most obviously, I don't want to lose the ability, >> introduced in ArrayDeque, of having O(1) insertion >> at the front and end of the collection. >> Perhaps you can do this by having one "arraylet" >> always be shared by both ends, which >> grow towards each other in circular fashion. >> >> I also think we should shrink the array when >> necessary, so that occupancy never drops >> below, say 50%. >> >> Perhaps we should also have amortized O(1) >> insertion in the middle by using a "gap array". >> Probably more important for byte/char collections >> like StringBuilder... >> >> I believe there are more complicated implementations >> that permit O(1) insertions at the ends, and only >> O(sqrt(N)) space overhead. >> >> .... >> >> E.g. Use your favorite search engine to do >> some research on: >> Resizable arrays in optimal time and space >> Succinct dynamic data structures >> >> Meta-comment: there is not enough transfer of >> academic research results into practice; I would think this >> is one of the responsibilities of the researchers. >> >> I presume you'd be willing to sign a >> contributor agreement to get your changes into >> the JDK someday. >> >> Martin >> >> > Regards, >> > >> > Kevin >> > >> > >> > On Tue, Mar 9, 2010 at 3:15 PM, Martin Buchholz >> > wrote: >> > >> > It surely is not a good idea to use a single backing array >> > for huge arrays. As you point out, it's up to 32GB >> > for just one object. But the core JDK >> > doesn't offer a suitable alternative for users who need very >> > large collections. >> > >> > It would have been more in the spirit of Java to have a >> > collection class instead of ArrayList that was not fastest at >> > any particular operation, but had excellent asymptotic behaviour, >> > based on backing arrays containing backing arrays. >> > But: >> > - no such excellent class has been written yet >> > (or please point me to such a class) >> > - even if it were, such a best-of-breed-general-purpose >> > List implementation would probably need to be introduced as a >> > separate class, because of the performance expectations of >> > existing implementations. >> > >> > In the meantime, we have to maintain what we got, >> > and that includes living with arrays and classes that wrap them. >> > >> > Changing the spec is unlikely to succeed.. >> > >> > Martin >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Weijun.Wang at Sun.COM Tue Mar 30 08:08:27 2010 From: Weijun.Wang at Sun.COM (Weijun Wang) Date: Tue, 30 Mar 2010 16:08:27 +0800 Subject: java.util.Pair Message-ID: Hi All There are multiple CRs asking for a java.util.Pair class: 4983155 6229146 4947273 I know such a simple thing can be made very complex and everyone might want to add a new method into it. How about we just make it most primitive? Simply an immutable and Serializable class, two final fields, one constructor, two getters (?), and no static factory methods. (S)he who does the real implementation has the privilege to choose between head/tail and car/cdr. Thanks Max From brucechapman at paradise.net.nz Tue Mar 30 09:03:44 2010 From: brucechapman at paradise.net.nz (Bruce Chapman) Date: Tue, 30 Mar 2010 22:03:44 +1300 Subject: java.util.Pair In-Reply-To: References: Message-ID: <4BB1BE70.2060901@paradise.net.nz> Weijun Wang wrote: > Hi All > > There are multiple CRs asking for a java.util.Pair class: > > 4983155 > 6229146 > 4947273 > > I know such a simple thing can be made very complex and everyone might want to add a new method into it. How about we just make it most primitive? Simply an immutable and Serializable class, two final fields, one constructor, two getters (?), and no static factory methods. (S)he who does the real implementation has the privilege to choose between head/tail and car/cdr. > > or first/second or left/right or a/b or foo/bar or chalk/cheese Bruce > Thanks > Max > > > From kevin.l.stern at gmail.com Tue Mar 30 11:25:41 2010 From: kevin.l.stern at gmail.com (Kevin L. Stern) Date: Tue, 30 Mar 2010 06:25:41 -0500 Subject: A List implementation backed by multiple small arrays rather than the traditional single large array. In-Reply-To: <1ccfd1c11003290023u5c59f926o8ceb79fe0d3bbc6f@mail.gmail.com> References: <1704b7a21003280455u784d4d2ape39a47e2367b79a8@mail.gmail.com> <1ccfd1c11003290023u5c59f926o8ceb79fe0d3bbc6f@mail.gmail.com> Message-ID: <1704b7a21003300425i7dd1ef7he28728ad3cdb60e2@mail.gmail.com> Hi Martin, Thanks much for your feedback. The first approach that comes to mind to implement O(1) time front as well as rear insertion is to create a cyclic list structure with a front/rear pointer - to insert at the front requires decrementing the front pointer (modulo the size) and to insert at the rear requires incrementing the rear pointer (modulo the size). We need to resize when the two pointers bump into each other. Could you explain more about your suggestion of introducing an arraylet that is shared by the front and the rear? It's not clear to me how that would help and/or be a better approach than the cyclic list. Anyhow, the paper that you reference, "Resizable arrays in optimal time and space", gives a deque so if we take that approach then the deque is specified. Shrinking the array is not a problem - this comes 'for free' (in the sense that it's required) in the optimal space data structure that you reference. Regarding the gap array suggestion, it is not clear to me how we will still compute the correct arraylet/offset for an index in O(1) time if we have arraylets of arbitrary size. Even worse, if we go with the optimal space data structure we will not have the option of creating arraylets of arbitrary size or with arbitrary gaps between elements. You are absolutely right about the n^(1/2) space overhead; I was not aware of this research. I'll go ahead and implement the structure defined in "Resizable arrays in optimal time and space" (once I find some time to do so). Regards, Kevin On Mon, Mar 29, 2010 at 2:23 AM, Martin Buchholz wrote: > On Sun, Mar 28, 2010 at 04:55, Kevin L. Stern > wrote: > > I put together the following class, ChunkedArrayList, in response to > > Martin's request (excerpted from an earlier conversation on this web > board) > > below. > > > > > https://docs.google.com/leaf?id=0B6brz3MPBDdhMGNiNGIwMTQtMTgxMi00ODlmLTk4ZGYtOWY2NDE0M2E5M2Zl&sort=name&layout=list&num=50 > > > > Thoughts? > > This class is well on the way to what I was thinking of, > but my bar for acceptance is a little higher. > In particular, I don't want to add yet another class > that is can replace some, but not all of existing > list implementations. > > Most obviously, I don't want to lose the ability, > introduced in ArrayDeque, of having O(1) insertion > at the front and end of the collection. > Perhaps you can do this by having one "arraylet" > always be shared by both ends, which > grow towards each other in circular fashion. > > I also think we should shrink the array when > necessary, so that occupancy never drops > below, say 50%. > > Perhaps we should also have amortized O(1) > insertion in the middle by using a "gap array". > Probably more important for byte/char collections > like StringBuilder... > > I believe there are more complicated implementations > that permit O(1) insertions at the ends, and only > O(sqrt(N)) space overhead. > > .... > > E.g. Use your favorite search engine to do > some research on: > Resizable arrays in optimal time and space > Succinct dynamic data structures > > Meta-comment: there is not enough transfer of > academic research results into practice; I would think this > is one of the responsibilities of the researchers. > > I presume you'd be willing to sign a > contributor agreement to get your changes into > the JDK someday. > > Martin > > > Regards, > > > > Kevin > > > > > > On Tue, Mar 9, 2010 at 3:15 PM, Martin Buchholz > > wrote: > > > > It surely is not a good idea to use a single backing array > > for huge arrays. As you point out, it's up to 32GB > > for just one object. But the core JDK > > doesn't offer a suitable alternative for users who need very > > large collections. > > > > It would have been more in the spirit of Java to have a > > collection class instead of ArrayList that was not fastest at > > any particular operation, but had excellent asymptotic behaviour, > > based on backing arrays containing backing arrays. > > But: > > - no such excellent class has been written yet > > (or please point me to such a class) > > - even if it were, such a best-of-breed-general-purpose > > List implementation would probably need to be introduced as a > > separate class, because of the performance expectations of > > existing implementations. > > > > In the meantime, we have to maintain what we got, > > and that includes living with arrays and classes that wrap them. > > > > Changing the spec is unlikely to succeed.. > > > > Martin > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.l.stern at gmail.com Tue Mar 30 11:46:17 2010 From: kevin.l.stern at gmail.com (Kevin L. Stern) Date: Tue, 30 Mar 2010 06:46:17 -0500 Subject: A List implementation backed by multiple small arrays rather than the traditional single large array. In-Reply-To: <1704b7a21003300425i7dd1ef7he28728ad3cdb60e2@mail.gmail.com> References: <1704b7a21003280455u784d4d2ape39a47e2367b79a8@mail.gmail.com> <1ccfd1c11003290023u5c59f926o8ceb79fe0d3bbc6f@mail.gmail.com> <1704b7a21003300425i7dd1ef7he28728ad3cdb60e2@mail.gmail.com> Message-ID: <1704b7a21003300446y24f5524dwa61322af324bffd6@mail.gmail.com> Just to state the obvious, though, operations will be somewhat slower with the optimal space structure. Retrieval, for instance, requires more than simply a shift and a bit mask (although not too much more). On Tue, Mar 30, 2010 at 6:25 AM, Kevin L. Stern wrote: > Hi Martin, > > Thanks much for your feedback. The first approach that comes to mind to > implement O(1) time front as well as rear insertion is to create a cyclic > list structure with a front/rear pointer - to insert at the front requires > decrementing the front pointer (modulo the size) and to insert at the rear > requires incrementing the rear pointer (modulo the size). We need to resize > when the two pointers bump into each other. Could you explain more about > your suggestion of introducing an arraylet that is shared by the front and > the rear? It's not clear to me how that would help and/or be a better > approach than the cyclic list. Anyhow, the paper that you reference, > "Resizable arrays in optimal time and space", gives a deque so if we take > that approach then the deque is specified. > > Shrinking the array is not a problem - this comes 'for free' (in the sense > that it's required) in the optimal space data structure that you reference. > > Regarding the gap array suggestion, it is not clear to me how we will still > compute the correct arraylet/offset for an index in O(1) time if we have > arraylets of arbitrary size. Even worse, if we go with the optimal space > data structure we will not have the option of creating arraylets of > arbitrary size or with arbitrary gaps between elements. > > You are absolutely right about the n^(1/2) space overhead; I was not aware > of this research. I'll go ahead and implement the structure defined in > "Resizable arrays in optimal time and space" (once I find some time to do > so). > > Regards, > > Kevin > > > On Mon, Mar 29, 2010 at 2:23 AM, Martin Buchholz wrote: > >> On Sun, Mar 28, 2010 at 04:55, Kevin L. Stern >> wrote: >> > I put together the following class, ChunkedArrayList, in response to >> > Martin's request (excerpted from an earlier conversation on this web >> board) >> > below. >> > >> > >> https://docs.google.com/leaf?id=0B6brz3MPBDdhMGNiNGIwMTQtMTgxMi00ODlmLTk4ZGYtOWY2NDE0M2E5M2Zl&sort=name&layout=list&num=50 >> > >> > Thoughts? >> >> This class is well on the way to what I was thinking of, >> but my bar for acceptance is a little higher. >> In particular, I don't want to add yet another class >> that is can replace some, but not all of existing >> list implementations. >> >> Most obviously, I don't want to lose the ability, >> introduced in ArrayDeque, of having O(1) insertion >> at the front and end of the collection. >> Perhaps you can do this by having one "arraylet" >> always be shared by both ends, which >> grow towards each other in circular fashion. >> >> I also think we should shrink the array when >> necessary, so that occupancy never drops >> below, say 50%. >> >> Perhaps we should also have amortized O(1) >> insertion in the middle by using a "gap array". >> Probably more important for byte/char collections >> like StringBuilder... >> >> I believe there are more complicated implementations >> that permit O(1) insertions at the ends, and only >> O(sqrt(N)) space overhead. >> >> .... >> >> E.g. Use your favorite search engine to do >> some research on: >> Resizable arrays in optimal time and space >> Succinct dynamic data structures >> >> Meta-comment: there is not enough transfer of >> academic research results into practice; I would think this >> is one of the responsibilities of the researchers. >> >> I presume you'd be willing to sign a >> contributor agreement to get your changes into >> the JDK someday. >> >> Martin >> >> > Regards, >> > >> > Kevin >> > >> > >> > On Tue, Mar 9, 2010 at 3:15 PM, Martin Buchholz >> > wrote: >> > >> > It surely is not a good idea to use a single backing array >> > for huge arrays. As you point out, it's up to 32GB >> > for just one object. But the core JDK >> > doesn't offer a suitable alternative for users who need very >> > large collections. >> > >> > It would have been more in the spirit of Java to have a >> > collection class instead of ArrayList that was not fastest at >> > any particular operation, but had excellent asymptotic behaviour, >> > based on backing arrays containing backing arrays. >> > But: >> > - no such excellent class has been written yet >> > (or please point me to such a class) >> > - even if it were, such a best-of-breed-general-purpose >> > List implementation would probably need to be introduced as a >> > separate class, because of the performance expectations of >> > existing implementations. >> > >> > In the meantime, we have to maintain what we got, >> > and that includes living with arrays and classes that wrap them. >> > >> > Changing the spec is unlikely to succeed.. >> > >> > Martin >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ulf.Zibis at gmx.de Tue Mar 30 11:46:52 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 30 Mar 2010 13:46:52 +0200 Subject: Refactor String's exception handling In-Reply-To: <1ccfd1c11003291517l4a46f260s5c78639244da6420@mail.gmail.com> References: <4A95079A.8080803@gmx.de> <4BA007A4.2030907@sun.com> <4BA3F0B5.1070404@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> <1ccfd1c11003211239h2105e5f1m903dd5d3fbf5387b@mail.gmail.com> <4BA78CE8.9020107@gmx.de> <1ccfd1c11003221527q29f61f7u700344a99d293ceb@mail.gmail.com> <4BAE73B7.40101@gmx.de> <1ccfd1c11003291517l4a46f260s5c78639244da6420@mail.gmail.com> Message-ID: <4BB1E4AC.9000307@gmx.de> Am 30.03.2010 00:17, schrieb Martin Buchholz: > Hi Ulf, > > I will sponsor your initiative to refactor the exception handling. > > Before this can go in, we should have just the exception handling > changes contained in one patch, since it is such a big change. > You mean, that I had "surreptitiously" included some beautification, even in the first patch? Yes, often I can't resist, hit me. Example: It seems, that someone before had tried to standardize the this-triple in String's constructors. Looking closer, you can see, that they slightly differ, so for my taste it looked best, ordering them in the member variables order, having the real value at first. On the other hand, I think it's too much overhead, to manage separate bugs for such beautifications. What you think is a reasonable threshold for such on-the-fly beautifications? > I'd like you to try to port my related RangeCheckMicroBenchmark > to string handling and hopefully demonstrate some measurable > performance improvement. > That would be great. :-) > ---- > > In the code below, I think some if's need to be changed to "else if"s. > (but don't just fix it - make sure we have a failing test with your > current code (you do run the regression tests religiously, right?)) > > + static void checkPositionIndexes(int srcLen, int begin, int end) { > + assert (srcLen>= 0); > + int index; > + if (begin< 0) > + index = begin; > + if (end> srcLen) > + index = begin>srcLen ? begin:end-begin; > + if (end< begin) > + index = begin>srcLen ? begin : end<0 ? end : end-begin; > + else > + return; > + throw new StringIndexOutOfBoundsException(index); > Good catch. The throws, I had replaced, had implicated the elses before. In Google code style it would have been: if (begin < 0 || end > srcLen || end < begin) You seem to like how I merged the different variations into one central standard behaviour. Is that valid for AbstractStringBuilder too? I think it best matches to current behavior. Exception message refers to ... 1. , if begin itself is invalid referring to 0 and srcLen 2. , if end itself is invalid referring to 0 and srcLen 3. , if end is invalid in combination with given begin Alternative: 2+3. , if end is invalid referring to 0 and srcLen or in combination with given begin The alternative may be easier to track for the developers, but less compatible with current behaviour, and a likly negative value speaks kinda for itself. In the checkSubsequence(..., offset, count) case, unfortunately there is a good chance to have positive values as result of offset+count. > ---- > it's => its > > + * following values are referred in it's message: > Yes. > ---- > > badIndex might be a better name for "index" below. > > + int index; > + if (begin< 0) > + index = begin; > Very good idea! > ---- > Run at least the following tests > (below is how I test this code myself) > > /home/martinrb/jct-tools/3.2.2_03/linux/bin/jtreg -v:nopass,fail > -vmoption:-enablesystemassertions -automatic "-k:\!ignore" > -testjdk:/usr/local/google/home/martin/ws/upstream/build/linux-amd64 > test/sun/nio/cs test/java/nio/charset test/java/lang/StringCoding > test/java/lang/StringBuilder test/java/lang/StringBuffer > test/java/lang/String test/java/lang/Appendable > Unfortunately I still haven't managed to even partly build a patched JDK on my Windows notebook. - CygWin crashes from too big work, e.g webrev on more than ~20 files. - Very few support on mailing list. - I'm wondering, that there is so few collaboration between NetBeans and JDK developers in same software company. So as workaround, I'm fine with running my patches via -Xbootclasspath in NetBeans IDE. So running jtreg tests I don't know how. I exclusively had written my test using JUnit, because there is a beautiful support from NetBeans. I remember, there was a email from Mark Reinold some months ago, that JUnit tests are too supported by jtreg from now. Maybe you have some suggestions to me. > ---- > > I think returning len below is too confusing. > Just make the return type void. > > + int checkPositionIndex(int index) { > + int len = count; // not sure, if JIT recognizes that it's final ? > + checkPositionIndex(len, index); > + return len; > + } > Returning the len is to prevent from 2 times slowly loading the member variable into local register/variable. From performance side I think, we only have to choices. Using the return trick or dropping those convenient methods at all. The latter would be faster for the interpreter and/or non inlined case. > ---- > > We will need a significant merge once I commit > related changes. > Maybe we could announce this on this list, so other's could decide, if they hurry to commit there changes before, or have to do there own merge later. -Ulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ulf.Zibis at gmx.de Tue Mar 30 11:58:52 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Tue, 30 Mar 2010 13:58:52 +0200 Subject: Pending Character-related work In-Reply-To: <4BB1C477.1040604@sun.com> References: <1ccfd1c11003291547k1bde83d7n9e8ed3cf367aa4a0@mail.gmail.com> <4BB1C477.1040604@sun.com> Message-ID: <4BB1E77C.4050600@gmx.de> I like to add, that all those bit-twiddling would be much easier, if we would get unsigned integers to Java. The everywhere repeated and sometimes differently optimized compare against 0 would become superfluous. This hope seems gone for JDK-7. If it will come one day, we can start the twiddling again. -Ulf Am 30.03.2010 11:29, schrieb Masayoshi Okutsu: > Hi Martin, > > I'm starting code review. I'm not an Oracle person yet, though. :-) > > Thanks, > -- > Masayoshi > Sun Microsystems K.K. > > On 3/30/2010 7:47 AM, Martin Buchholz wrote: >> Hi Character team, >> >> Below is the (very long) list of pending changes in my queue to be >> committed. >> >> ... >> From assembling.signals at yandex.ru Tue Mar 30 12:39:12 2010 From: assembling.signals at yandex.ru (assembling signals) Date: Tue, 30 Mar 2010 16:39:12 +0400 Subject: java.util.Pair In-Reply-To: References: Message-ID: <226001269952752@webmail122.yandex.ru> Hi! Do you mean, it would be good to have a "standard" implementing class of the interface Map.Entry ? This does exist: AbstractMap.SimpleEntry. Well, of course both the interface and the class are somewhat 'hidden', but nevertheless, they do exist. 30.03.10, 16:08, "Weijun Wang" : > Hi All > > There are multiple CRs asking for a java.util.Pair class: > > 4983155 > 6229146 > 4947273 > > I know such a simple thing can be made very complex and everyone might want to add a new method into it. How about we just make it most primitive? Simply an immutable and Serializable class, two final fields, one constructor, two getters (?), and no static factory methods. (S)he who does the real implementation has the privilege to choose between head/tail and car/cdr. > > Thanks > Max > > > -- ????? ????? ??? http://mail.yandex.ru/nospam/sign From opinali at gmail.com Tue Mar 30 13:30:57 2010 From: opinali at gmail.com (Osvaldo Doederlein) Date: Tue, 30 Mar 2010 10:30:57 -0300 Subject: A List implementation backed by multiple small arrays rather than the traditional single large array. In-Reply-To: <1704b7a21003291624n740dbc8bibbc15e1b8e0291d4@mail.gmail.com> References: <1704b7a21003280455u784d4d2ape39a47e2367b79a8@mail.gmail.com> <1ccfd1c11003290023u5c59f926o8ceb79fe0d3bbc6f@mail.gmail.com> <1704b7a21003291624n740dbc8bibbc15e1b8e0291d4@mail.gmail.com> Message-ID: The VM-based arraylet implementation is by design minimalistic: it only splits large arrays into smaller ones, nothing more. You must still wrap primitive arrays by collection APIs uif you want anything else, including dynamic size. But the opportunity to get some extra VM help for dynamic sizing is obvious. Consider C's realloc() function. The Java language doesn't currently have a realloc()-like API, because it's generally useless in a garbage-collected heap that most often does not use free lists, and most often allows compaction (realloc() is largely a clever, statistically-efficient trick to gain some performance back from fragmented heaps). Now, arraylets would enable a special kind of safe realloc() operation that makes sense for primitive arrays: it would always return a new array, in the sense that the "root" array (pointers to slices) is new; but sharing most of the slices with the original array. So if you have a 100K-element array (that needs a 100-element root array for 1K slices), and grow it into 150K-position, we only need to allocate new slices for the extra 50K positions, plus the new 150-element root array. And we only need to copy data from the 100 positions of the old root array to the new one (and maybe, from a single slice in the end of the original array, if its size didn't match the maximum slice size - but then the collections growing algorithm could easily avoid this). Array shrinking is even easier. Notice that the old root array is still a live object, and now its slices are aliased by a new root array, but this is only potentially confusing, it's not unsafe. Collections would encapsulate these arrays and not expose any aliasing or sharing (by not keeping any reference to the original array after resize operations). A+ Osvaldo 2010/3/29 Kevin L. Stern > One advantage of this approach over the VM approach is that no data copy is > necessary when the capacity of the data structure is expanded (new arrays > are tacked on to the end of the top level array of references) or contracted > (arrays of null are removed from the top level array of references) aside > from the (any?) copy of the top level array of references. One way to > address your concern, though, is to create a ChunkedArray class that simply > wraps an array and provides expand and contract functionality. This could > be reused in any/all collections. > > > On Mon, Mar 29, 2010 at 10:08 AM, Osvaldo Doederlein wrote: > >> Initially, it would be good enough to replace only java.util.ArrayList >> with minimal overhead. ArrayList does not support efficient add-at-front or >> other enhancements of ArrayDeque; but ArrayList is still a much more >> important and popular collection, it's the primary "straight replacement for >> primitive arrrays" and I guess it should continue with that role. >> >> One problem of both ArrayList and primitive arrays is that they're not >> GC-friendly; huge arrays suck for GC. IBM's realtime Metronome collector >> uses the "arraylet" structure for primitive arrays, so there is a hard >> upper-limit on object size (well, at least as long as apps don't define >> classes with thousands of fields, I guess). This avoids the whole issue of >> "large objects" which permits a simpler heap layout, better incremental GC, >> etc. There are two tradeoffs. First, some overhead for all array operations >> - but this is the least important, remarkably as the arraylet trick is >> implement at the VM level so we can rely on the JIT to perform extra >> optimizations (e.g., unrolling and other loop optimizations; bounds-check >> elimination and other array opts, may be arraylet-aware so most overhead is >> cancelled or at least lifted out of loop bodies and hot paths.) Second, no >> support at all for huge arrays is incompatible with native code that expects >> a continuous layout, e.g. for the byte[]s inside Images - so all these uses >> must be identified and fixed somehow, e.g using DirectBuffers, or changing >> the native layer to understand arraylets (image libaries may be OK with >> banding), or in the worst case just copy the data to/from a continuous, >> native array (in most cases I think this copy already happens for other >> reasons, so there's no extra copy, just a slightly more expensive copy). >> >> Now we're talking about some big VM change of course, but HotSpot would >> not be the first production VM to do this so maybe it's a viable project for >> the future, remarkably as Sun plans to keep raising the bar in >> incremental/realtime GC (G1 may already be a great step forward, but huge >> arrays will always spoil the fun for many apps). >> >> In summary I think the ChunkedArrayList would serve only as a stopgap >> solution, with extremely limited benefits unless it's sufficiently good so >> like Martin says, we can replace more List implementations. And I'll even >> add, replace many other collections too - e.g. a giant HashMap will contain >> a giant Entry[] array inside it, I want this array to be chunked too >> (ConcurrentHashMap already is, but it's tuned up differently, for concurrent >> usage - and that's just one example anyway). And by "replace" I further mean >> "change the implementation of all existing collections that are >> array-backed", not "offer new collections" as the latter will only be >> heavily used ten years from today when JavaSE7 is considered the minimum >> JavaSE release to be supported by apps/libraries/frameworks/containers/etc. >> Even then, the benefits will be clearly inferior to what can be achireved by >> VM-level arraylets. >> >> A+ >> Osvaldo >> >> >> 2010/3/29 Martin Buchholz >> >> On Sun, Mar 28, 2010 at 04:55, Kevin L. Stern >>> wrote: >>> > I put together the following class, ChunkedArrayList, in response to >>> > Martin's request (excerpted from an earlier conversation on this web >>> board) >>> > below. >>> > >>> > >>> https://docs.google.com/leaf?id=0B6brz3MPBDdhMGNiNGIwMTQtMTgxMi00ODlmLTk4ZGYtOWY2NDE0M2E5M2Zl&sort=name&layout=list&num=50 >>> > >>> > Thoughts? >>> >>> This class is well on the way to what I was thinking of, >>> but my bar for acceptance is a little higher. >>> In particular, I don't want to add yet another class >>> that is can replace some, but not all of existing >>> list implementations. >>> >>> Most obviously, I don't want to lose the ability, >>> introduced in ArrayDeque, of having O(1) insertion >>> at the front and end of the collection. >>> Perhaps you can do this by having one "arraylet" >>> always be shared by both ends, which >>> grow towards each other in circular fashion. >>> >>> I also think we should shrink the array when >>> necessary, so that occupancy never drops >>> below, say 50%. >>> >>> Perhaps we should also have amortized O(1) >>> insertion in the middle by using a "gap array". >>> Probably more important for byte/char collections >>> like StringBuilder... >>> >>> I believe there are more complicated implementations >>> that permit O(1) insertions at the ends, and only >>> O(sqrt(N)) space overhead. >>> >>> .... >>> >>> E.g. Use your favorite search engine to do >>> some research on: >>> Resizable arrays in optimal time and space >>> Succinct dynamic data structures >>> >>> Meta-comment: there is not enough transfer of >>> academic research results into practice; I would think this >>> is one of the responsibilities of the researchers. >>> >>> I presume you'd be willing to sign a >>> contributor agreement to get your changes into >>> the JDK someday. >>> >>> Martin >>> >>> > Regards, >>> > >>> > Kevin >>> > >>> > >>> > On Tue, Mar 9, 2010 at 3:15 PM, Martin Buchholz >>> > wrote: >>> > >>> > It surely is not a good idea to use a single backing array >>> > for huge arrays. As you point out, it's up to 32GB >>> > for just one object. But the core JDK >>> > doesn't offer a suitable alternative for users who need very >>> > large collections. >>> > >>> > It would have been more in the spirit of Java to have a >>> > collection class instead of ArrayList that was not fastest at >>> > any particular operation, but had excellent asymptotic behaviour, >>> > based on backing arrays containing backing arrays. >>> > But: >>> > - no such excellent class has been written yet >>> > (or please point me to such a class) >>> > - even if it were, such a best-of-breed-general-purpose >>> > List implementation would probably need to be introduced as a >>> > separate class, because of the performance expectations of >>> > existing implementations. >>> > >>> > In the meantime, we have to maintain what we got, >>> > and that includes living with arrays and classes that wrap them. >>> > >>> > Changing the spec is unlikely to succeed.. >>> > >>> > Martin >>> > >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Tue Mar 30 17:54:17 2010 From: kevinb at google.com (Kevin Bourrillion) Date: Tue, 30 Mar 2010 10:54:17 -0700 Subject: java.util.Pair In-Reply-To: References: Message-ID: <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com> Pair is only a partial, flawed solution to a special case (n=2) of a very significant problem: the disproportionate complexity of creating value types in Java. I support addressing the underlying problem in Java 8, and not littering the API with dead-end solutions like Pair. On Tue, Mar 30, 2010 at 1:08 AM, Weijun Wang wrote: > Hi All > > There are multiple CRs asking for a java.util.Pair class: > > 4983155 > 6229146 > 4947273 > > I know such a simple thing can be made very complex and everyone might want > to add a new method into it. How about we just make it most primitive? > Simply an immutable and Serializable class, two final fields, one > constructor, two getters (?), and no static factory methods. (S)he who does > the real implementation has the privilege to choose between head/tail and > car/cdr. > > Thanks > Max > > -- Kevin Bourrillion @ Google internal: http://goto/javalibraries external: http://guava-libraries.googlecode.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From scolebourne at joda.org Tue Mar 30 20:39:00 2010 From: scolebourne at joda.org (Stephen Colebourne) Date: Tue, 30 Mar 2010 16:39:00 -0400 Subject: java.util.Pair In-Reply-To: <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com> References: <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com> Message-ID: <4b4f45e01003301339k5f110a74o2831de234b97d381@mail.gmail.com> While I support Kevin?s summary, having a public implementation of Map.Entry in java.util would be very useful. (Along with making other private classes public - unmodifiable iterator is one IIRC) Stephen On 30 March 2010 13:54, Kevin Bourrillion wrote: > Pair is only a partial, flawed solution to a special case (n=2) of a very > significant problem: the disproportionate complexity of creating value types > in Java. ?I support addressing the underlying problem in Java 8, and not > littering the API with dead-end solutions like Pair. > > > On Tue, Mar 30, 2010 at 1:08 AM, Weijun Wang wrote: >> >> Hi All >> >> There are multiple CRs asking for a java.util.Pair class: >> >> ? 4983155 >> ? 6229146 >> ? 4947273 >> >> I know such a simple thing can be made very complex and everyone might >> want to add a new method into it. How about we just make it most primitive? >> Simply an immutable and Serializable class, two final fields, one >> constructor, two getters (?), and no static factory methods. (S)he who does >> the real implementation has the privilege to choose between head/tail and >> car/cdr. >> >> Thanks >> Max >> > > > > -- > Kevin Bourrillion @ Google > internal:? http://goto/javalibraries > external: http://guava-libraries.googlecode.com > > From martinrb at google.com Tue Mar 30 20:55:06 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 30 Mar 2010 13:55:06 -0700 Subject: java.util.Pair In-Reply-To: <4b4f45e01003301339k5f110a74o2831de234b97d381@mail.gmail.com> References: <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com> <4b4f45e01003301339k5f110a74o2831de234b97d381@mail.gmail.com> Message-ID: <1ccfd1c11003301355v7a9975feoaf1aebe05fdf688b@mail.gmail.com> On Tue, Mar 30, 2010 at 13:39, Stephen Colebourne wrote: > While I support Kevin?s summary, having a public implementation of > Map.Entry in java.util would be very useful. (Along with making other > private classes public - unmodifiable iterator is one IIRC) ./AbstractMap.java:569: public static class SimpleEntry ./AbstractMap.java:699: public static class SimpleImmutableEntry --- Which unmodifiable iterator? From scolebourne at joda.org Tue Mar 30 21:01:57 2010 From: scolebourne at joda.org (Stephen Colebourne) Date: Tue, 30 Mar 2010 17:01:57 -0400 Subject: java.util.Pair In-Reply-To: <1ccfd1c11003301355v7a9975feoaf1aebe05fdf688b@mail.gmail.com> References: <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com> <4b4f45e01003301339k5f110a74o2831de234b97d381@mail.gmail.com> <1ccfd1c11003301355v7a9975feoaf1aebe05fdf688b@mail.gmail.com> Message-ID: <4b4f45e01003301401s1641c714qa6e4a66aab6a9b81@mail.gmail.com> (I?m writing from a slow connection in a national park in Chile) I meant a decortator for an iterator that wraps the original making it immutable. Stephen On 30 March 2010 16:55, Martin Buchholz wrote: > On Tue, Mar 30, 2010 at 13:39, Stephen Colebourne wrote: >> While I support Kevin?s summary, having a public implementation of >> Map.Entry in java.util would be very useful. (Along with making other >> private classes public - unmodifiable iterator is one IIRC) > > ./AbstractMap.java:569: ? ?public static class SimpleEntry > ./AbstractMap.java:699: ? ?public static class SimpleImmutableEntry > > --- > > Which unmodifiable iterator? > From martinrb at google.com Tue Mar 30 22:20:22 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 30 Mar 2010 15:20:22 -0700 Subject: A List implementation backed by multiple small arrays rather than the traditional single large array. In-Reply-To: <1704b7a21003300425i7dd1ef7he28728ad3cdb60e2@mail.gmail.com> References: <1704b7a21003280455u784d4d2ape39a47e2367b79a8@mail.gmail.com> <1ccfd1c11003290023u5c59f926o8ceb79fe0d3bbc6f@mail.gmail.com> <1704b7a21003300425i7dd1ef7he28728ad3cdb60e2@mail.gmail.com> Message-ID: <1ccfd1c11003301520g564876fehfce57def62f6d6b3@mail.gmail.com> On Tue, Mar 30, 2010 at 04:25, Kevin L. Stern wrote: > Hi Martin, > > Thanks much for your feedback.? The first approach that comes to mind to > implement O(1) time front as well as rear insertion is to create a cyclic > list structure with a front/rear pointer - to insert at the front requires > decrementing the front pointer (modulo the size) and to insert at the rear > requires incrementing the rear pointer (modulo the size).? We need to resize > when the two pointers bump into each other.? Could you explain more about > your suggestion of introducing an arraylet that is shared by the front and > the rear? It was a half-baked idea - I don't know if there's a way to turn it into something useful. I was thinking of the ArrayDeque implementation, where all the elements live in a single array. > It's not clear to me how that would help and/or be a better > approach than the cyclic list.? Anyhow, the paper that you reference, > "Resizable arrays in optimal time and space", gives a deque so if we take > that approach then the deque is specified. Technically, ArrayList also supports the Deque operations - just not efficiently. From ben_manes at yahoo.com Tue Mar 30 22:45:34 2010 From: ben_manes at yahoo.com (Ben Manes) Date: Tue, 30 Mar 2010 15:45:34 -0700 (PDT) Subject: A List implementation backed by multiple small arrays rather than the traditional single large array. In-Reply-To: <1704b7a21003300425i7dd1ef7he28728ad3cdb60e2@mail.gmail.com> References: <1704b7a21003280455u784d4d2ape39a47e2367b79a8@mail.gmail.com> <1ccfd1c11003290023u5c59f926o8ceb79fe0d3bbc6f@mail.gmail.com> <1704b7a21003300425i7dd1ef7he28728ad3cdb60e2@mail.gmail.com> Message-ID: <575486.92749.qm@web38807.mail.mud.yahoo.com> You might be able to take some ideas from the VList data structure and zipper model. The research on persistent data structures tend to have some fairly interesting ideas and some approaches might work well here. ________________________________ From: Kevin L. Stern To: Martin Buchholz Cc: core-libs-dev at openjdk.java.net Sent: Tue, March 30, 2010 4:25:41 AM Subject: Re: A List implementation backed by multiple small arrays rather than the traditional single large array. Hi Martin, Thanks much for your feedback. The first approach that comes to mind to implement O(1) time front as well as rear insertion is to create a cyclic list structure with a front/rear pointer - to insert at the front requires decrementing the front pointer (modulo the size) and to insert at the rear requires incrementing the rear pointer (modulo the size). We need to resize when the two pointers bump into each other. Could you explain more about your suggestion of introducing an arraylet that is shared by the front and the rear? It's not clear to me how that would help and/or be a better approach than the cyclic list. Anyhow, the paper that you reference, "Resizable arrays in optimal time and space", gives a deque so if we take that approach then the deque is specified. Shrinking the array is not a problem - this comes 'for free' (in the sense that it's required) in the optimal space data structure that you reference. Regarding the gap array suggestion, it is not clear to me how we will still compute the correct arraylet/offset for an index in O(1) time if we have arraylets of arbitrary size. Even worse, if we go with the optimal space data structure we will not have the option of creating arraylets of arbitrary size or with arbitrary gaps between elements. You are absolutely right about the n^(1/2) space overhead; I was not aware of this research. I'll go ahead and implement the structure defined in "Resizable arrays in optimal time and space" (once I find some time to do so). Regards, Kevin On Mon, Mar 29, 2010 at 2:23 AM, Martin Buchholz wrote: On Sun, Mar 28, 2010 at 04:55, Kevin L. Stern wrote: >>> I put together the following class, ChunkedArrayList, in response to >>> Martin's request (excerpted from an earlier conversation on this web board) >>> below. >>> >>> https://docs.google.com/leaf?id=0B6brz3MPBDdhMGNiNGIwMTQtMTgxMi00ODlmLTk4ZGYtOWY2NDE0M2E5M2Zl&sort=name&layout=list&num=50 >> >> >>> Thoughts? > >This class is well on the way to what I was thinking of, >>but my bar for acceptance is a little higher. >>In particular, I don't want to add yet another class >>that is can replace some, but not all of existing >>list implementations. > >>Most obviously, I don't want to lose the ability, >>introduced in ArrayDeque, of having O(1) insertion >>at the front and end of the collection. >>Perhaps you can do this by having one "arraylet" >>always be shared by both ends, which >>grow towards each other in circular fashion. > >>I also think we should shrink the array when >>necessary, so that occupancy never drops >>below, say 50%. > >>Perhaps we should also have amortized O(1) >>insertion in the middle by using a "gap array". >>Probably more important for byte/char collections >>like StringBuilder... > >>I believe there are more complicated implementations >>that permit O(1) insertions at the ends, and only >>O(sqrt(N)) space overhead. > >>.... > >>E.g. Use your favorite search engine to do >>some research on: >>Resizable arrays in optimal time and space >>Succinct dynamic data structures > >>Meta-comment: there is not enough transfer of >>academic research results into practice; I would think this >>is one of the responsibilities of the researchers. > >>I presume you'd be willing to sign a >>contributor agreement to get your changes into >>the JDK someday. > >>Martin > > >>> Regards, >>> >>> Kevin >>> >>> >>> On Tue, Mar 9, 2010 at 3:15 PM, Martin Buchholz >>> wrote: >>> >>> It surely is not a good idea to use a single backing array >>> for huge arrays. As you point out, it's up to 32GB >>> for just one object. But the core JDK >>> doesn't offer a suitable alternative for users who need very >>> large collections. >>> >>> It would have been more in the spirit of Java to have a >>> collection class instead of ArrayList that was not fastest at >>> any particular operation, but had excellent asymptotic behaviour, >>> based on backing arrays containing backing arrays. >>> But: >>> - no such excellent class has been written yet >>> (or please point me to such a class) >>> - even if it were, such a best-of-breed-general-purpose >>> List implementation would probably need to be introduced as a >>> separate class, because of the performance expectations of >>> existing implementations. >>> >>> In the meantime, we have to maintain what we got, >>> and that includes living with arrays and classes that wrap them. >>> >>> Changing the spec is unlikely to succeed.. >>> >>> Martin >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason_mehrens at hotmail.com Tue Mar 30 23:11:58 2010 From: jason_mehrens at hotmail.com (Jason Mehrens) Date: Tue, 30 Mar 2010 18:11:58 -0500 Subject: java.util.Pair In-Reply-To: <4b4f45e01003301401s1641c714qa6e4a66aab6a9b81@mail.gmail.com> References: , <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com>, <4b4f45e01003301339k5f110a74o2831de234b97d381@mail.gmail.com>, <1ccfd1c11003301355v7a9975feoaf1aebe05fdf688b@mail.gmail.com>, <4b4f45e01003301401s1641c714qa6e4a66aab6a9b81@mail.gmail.com> Message-ID: Stephen, I'm all for adding support for unmodifiableIterable, unmodifableNavigableMap, and unmodifableNavigableSet. However, I think adding public access to such a iterator decorator goes against the guidelines of the collections design faq (4 and 5): http://java.sun.com/javase/6/docs/technotes/guides/collections/designfaq.html#8 So following the guideline of never passing an Iterator around, that leaves you with the following: 1. Your custom container is backed by collection, use Collections.unmodifiableXXX(this.internal).iterator() 2. Your custom container is backed by an array, use Arrays.asList, followed by point 1. 3. Your custom container has specialized layout, you have to write an iterator with a remove implementation that removes or throws and the unmodifable one is easy to write. Assuming that JDK had unmodifiableIterable decorator, is there is there a corner case that I'm not seeing or is the main reservation the extra method calls and creation of some well behaved garbage? Jason > Date: Tue, 30 Mar 2010 17:01:57 -0400 > Subject: Re: java.util.Pair > From: scolebourne at joda.org > To: core-libs-dev at openjdk.java.net > > (I?m writing from a slow connection in a national park in Chile) > > I meant a decortator for an iterator that wraps the original making it > immutable. > > Stephen _________________________________________________________________ Hotmail: Trusted email with Microsoft?s powerful SPAM protection. http://clk.atdmt.com/GBL/go/210850552/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From joe.darcy at Oracle.com Tue Mar 30 23:34:52 2010 From: joe.darcy at Oracle.com (joe.darcy at Oracle.com) Date: Tue, 30 Mar 2010 16:34:52 -0700 Subject: java.util.Pair In-Reply-To: <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com> References: <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com> Message-ID: <4BB28A9C.10705@oracle.com> On 3/30/2010 10:54 AM, Kevin Bourrillion wrote: > Pair is only a partial, flawed solution to a special case (n=2) of a > very significant problem: the disproportionate complexity of creating > value types in Java. I support addressing the underlying problem in > Java 8, and not littering the API with dead-end solutions like Pair. While I have sympathy with that conclusion, there is the side-effect of littering many APIs with the flotsam of lots of different classes named "Pair." My inclination would be to produce one adequate Pair class in the JDK to prevent the proliferation of yet more Pair classes in other code bases. I should know better than to take the bait, below is a first cut at java.util.Pair. -Joe package java.util; import java.util.Objects; /** * An immutable pair of values. The values may be null. The values * themselves may be mutable. * * @param the type of the first element of the pair * @param the type of the second element of the pair * * @since 1.7 */ public final class Pair { private final A a; private final B b; private Pair(A a, B b) { this.a = a; this.b = b; } /** * Returns a pair whose elements are the first and second * arguments, respectively. * @return a pair constructed from the arguments */ public static Pair valueOf(C c, D d) { // Don't mandate new values. return new Pair(c, d); } /** * Returns the value of the first element of the pair. * @return the value of the first element of the pair */ public A getA() { return a; } /** * Returns the value of the second element of the pair. * @return the value of the second element of the pair */ public B getB() { return b; } /** * TBD */ @Override public String toString() { return "[" + Objects.toString(a) + ", " + Objects.toString(b) + "]"; } /** * TBD */ @Override public boolean equals(Object x) { if (!(x instanceof Pair)) return false; else { Pair that = (Pair) x; return Objects.equals(this.a, that.a) && Objects.equals(this.b, that.b); } } /** * TBD */ @Override public int hashCode() { return Objects.hash(a, b); } } From xueming.shen at sun.com Wed Mar 31 02:15:11 2010 From: xueming.shen at sun.com (xueming.shen at sun.com) Date: Wed, 31 Mar 2010 02:15:11 +0000 Subject: hg: jdk7/tl/jdk: 6902790: Converting/displaying HKSCs characters issue on Vista and Windows7; ... Message-ID: <20100331021524.C3C5044ABB@hg.openjdk.java.net> Changeset: 3771ac2a8b3b Author: sherman Date: 2010-03-30 19:10 -0700 URL: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/3771ac2a8b3b 6902790: Converting/displaying HKSCs characters issue on Vista and Windows7 6911753: NSN wants to add Big5 HKSCS-2004 support Summary: support HKSCS2008 in Big5_HKSCS and MS950_HKSCS Reviewed-by: okutsu ! make/sun/nio/cs/FILES_java.gmk ! make/sun/nio/cs/Makefile + make/tools/CharsetMapping/Big5.c2b + make/tools/CharsetMapping/Big5.map + make/tools/CharsetMapping/Big5.nr + make/tools/CharsetMapping/HKSCS2001.c2b + make/tools/CharsetMapping/HKSCS2001.map + make/tools/CharsetMapping/HKSCS2008.c2b + make/tools/CharsetMapping/HKSCS2008.map + make/tools/CharsetMapping/HKSCS_XP.c2b + make/tools/CharsetMapping/HKSCS_XP.map ! make/tools/CharsetMapping/dbcs - make/tools/src/build/tools/charsetmapping/CharsetMapping.java + make/tools/src/build/tools/charsetmapping/DBCS.java + make/tools/src/build/tools/charsetmapping/EUC_TW.java - make/tools/src/build/tools/charsetmapping/GenerateDBCS.java - make/tools/src/build/tools/charsetmapping/GenerateEUC_TW.java - make/tools/src/build/tools/charsetmapping/GenerateMapping.java - make/tools/src/build/tools/charsetmapping/GenerateSBCS.java + make/tools/src/build/tools/charsetmapping/HKSCS.java + make/tools/src/build/tools/charsetmapping/JIS0213.java ! make/tools/src/build/tools/charsetmapping/Main.java + make/tools/src/build/tools/charsetmapping/SBCS.java + make/tools/src/build/tools/charsetmapping/Utils.java ! src/share/classes/sun/awt/HKSCS.java ! src/share/classes/sun/io/ByteToCharBig5.java ! src/share/classes/sun/io/ByteToCharBig5_HKSCS.java ! src/share/classes/sun/io/ByteToCharBig5_Solaris.java - src/share/classes/sun/io/ByteToCharHKSCS.java - src/share/classes/sun/io/ByteToCharHKSCS_2001.java ! src/share/classes/sun/io/ByteToCharMS950_HKSCS.java ! src/share/classes/sun/io/CharToByteBig5.java ! src/share/classes/sun/io/CharToByteBig5_HKSCS.java ! src/share/classes/sun/io/CharToByteBig5_Solaris.java - src/share/classes/sun/io/CharToByteHKSCS.java - src/share/classes/sun/io/CharToByteHKSCS_2001.java ! src/share/classes/sun/io/CharToByteMS950_HKSCS.java - src/share/classes/sun/nio/cs/ext/Big5.java ! src/share/classes/sun/nio/cs/ext/Big5_HKSCS.java + src/share/classes/sun/nio/cs/ext/Big5_HKSCS_2001.java ! src/share/classes/sun/nio/cs/ext/Big5_Solaris.java ! src/share/classes/sun/nio/cs/ext/ExtendedCharsets.java ! src/share/classes/sun/nio/cs/ext/HKSCS.java - src/share/classes/sun/nio/cs/ext/HKSCS_2001.java ! src/share/classes/sun/nio/cs/ext/MS950_HKSCS.java + src/share/classes/sun/nio/cs/ext/MS950_HKSCS_XP.java ! src/solaris/classes/sun/awt/fontconfigs/solaris.fontconfig.properties ! src/solaris/native/java/lang/java_props_md.c ! src/windows/classes/sun/awt/windows/fontconfig.properties ! src/windows/native/java/lang/java_props_md.c ! test/java/nio/charset/Charset/NIOCharsetAvailabilityTest.java ! test/java/nio/charset/Charset/RegisteredCharsets.java From martinrb at google.com Wed Mar 31 04:41:37 2010 From: martinrb at google.com (Martin Buchholz) Date: Tue, 30 Mar 2010 21:41:37 -0700 Subject: Refactor String's exception handling In-Reply-To: <4BB1E4AC.9000307@gmx.de> References: <4A95079A.8080803@gmx.de> <1ccfd1c11003191713w7178db28u161fd7c42127a775@mail.gmail.com> <4BA543A0.2060600@gmx.de> <1ccfd1c11003210056r13140d02kedc569722567ea2e@mail.gmail.com> <1ccfd1c11003211239h2105e5f1m903dd5d3fbf5387b@mail.gmail.com> <4BA78CE8.9020107@gmx.de> <1ccfd1c11003221527q29f61f7u700344a99d293ceb@mail.gmail.com> <4BAE73B7.40101@gmx.de> <1ccfd1c11003291517l4a46f260s5c78639244da6420@mail.gmail.com> <4BB1E4AC.9000307@gmx.de> Message-ID: <1ccfd1c11003302141v206eff50wd4497fa89c93539d@mail.gmail.com> On Tue, Mar 30, 2010 at 04:46, Ulf Zibis wrote: > Am 30.03.2010 00:17, schrieb Martin Buchholz: > > Hi Ulf, > > I will sponsor your initiative to refactor the exception handling. > > Before this can go in, we should have just the exception handling > changes contained in one patch, since it is such a big change. > > > You mean, that I had "surreptitiously" included some beautification, even in > the first patch? > Yes, often I can't resist, hit me. Example: > It seems, that someone before had tried to standardize the this-triple in > String's constructors. Looking closer, you can see, that they slightly > differ, so for my taste it looked best, ordering them in the member > variables order, having the real value at first. Yes, it's a tough question as to how finely to split changes. The overhead of creating separate changes (must have a bug ID) is unfortunately higher than we'd like. That said, there are big advantages of separating out purely cosmetic large changes. E.g. we can verify that the generated bytecode is identical. This becomes much more important for pervasive mechanical changes, like changing @exception => @throws. > On the other hand, I think it's too much overhead, to manage separate bugs > for such beautifications. > What you think is a reasonable threshold for such on-the-fly > beautifications? > You seem to like how I merged the different variations into one central > standard behaviour. Is that valid for AbstractStringBuilder too? > I think it best matches to current behavior. > Exception message refers to ... > 1. , if begin itself is invalid referring to 0 and srcLen > 2. , if end itself is invalid referring to 0 and srcLen > 3. , if end is invalid in combination with given begin > Alternative: > 2+3. , if end is invalid referring to 0 and srcLen or in combination > with given begin Better detail messages are slightly incompatible, but helpful for most users. Should we switch? It depends on how much we value compatibility. Probably the JDK culture is still too conservative. > ---- > Run at least the following tests > (below is how I test this code myself) > > /home/martinrb/jct-tools/3.2.2_03/linux/bin/jtreg -v:nopass,fail > -vmoption:-enablesystemassertions -automatic "-k:\!ignore" > -testjdk:/usr/local/google/home/martin/ws/upstream/build/linux-amd64 > test/sun/nio/cs test/java/nio/charset test/java/lang/StringCoding > test/java/lang/StringBuilder test/java/lang/StringBuffer > test/java/lang/String test/java/lang/Appendable > > > Unfortunately I still haven't managed to even partly build a patched JDK on > my Windows notebook. It's fine to run javac + use -Xbootclasspath. > - CygWin crashes from too big work, e.g webrev on more than ~20 files. > - Very few support on mailing list. > - I'm wondering, that there is so few collaboration between NetBeans and JDK > developers in same software company. > > So as workaround, I'm fine with running my patches via -Xbootclasspath in > NetBeans IDE. > So running jtreg tests I don't know how. You should really learn how to run JDK tests. JDK development on Linux is easier than development on Windows, but it certainly should be possible on Windows. Recent versions of jtreg are probably easier to run on Windows. http://openjdk.java.net/jtreg/index.html > I exclusively had written my test using JUnit, because there is a beautiful > support from NetBeans. > I remember, there was a email from Mark Reinold some months ago, that JUnit > tests are too supported by jtreg from now. I would be interested in that as well. > Maybe you have some suggestions to me. > > ---- > > I think returning len below is too confusing. > Just make the return type void. > > + int checkPositionIndex(int index) { > + int len = count; // not sure, if JIT recognizes that it's final ? > + checkPositionIndex(len, index); > + return len; > + } > > > Returning the len is to prevent from 2 times slowly loading the member > variable into local register/variable. > From performance side I think, we only have to choices. Using the return > trick or dropping those convenient methods at all. > The latter would be faster for the interpreter and/or non inlined case. In core libraries we often make engineering decisions to use trickier or more verbose code for the sake of performance, but I think this is going over the line. I think you can rely on inlining of such small, always called, methods like checkPositionIndex. > ---- > > We will need a significant merge once I commit > related changes. > > > Maybe we could announce this on this list, so other's could decide, if they > hurry to commit there changes before, or have to do there own merge later. I run into lots of merge conflicts, but always with my own changes! I don't think we have a lot of contention. Martin From forax at univ-mlv.fr Wed Mar 31 07:31:22 2010 From: forax at univ-mlv.fr (=?UTF-8?B?UsOpbWkgRm9yYXg=?=) Date: Wed, 31 Mar 2010 09:31:22 +0200 Subject: java.util.Pair In-Reply-To: <4BB28A9C.10705@oracle.com> References: <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com> <4BB28A9C.10705@oracle.com> Message-ID: <4BB2FA4A.6090809@univ-mlv.fr> Le 31/03/2010 01:34, joe.darcy at Oracle.com a ?crit : > > > On 3/30/2010 10:54 AM, Kevin Bourrillion wrote: >> Pair is only a partial, flawed solution to a special case (n=2) of a >> very significant problem: the disproportionate complexity of creating >> value types in Java. I support addressing the underlying problem in >> Java 8, and not littering the API with dead-end solutions like Pair. > > While I have sympathy with that conclusion, there is the > side-effect of littering many APIs with the flotsam of lots of different > classes named "Pair." My inclination would be to produce one adequate > Pair class in the JDK to prevent the proliferation of yet more Pair > classes in other code bases. > > I should know better than to take the bait, below is a first cut at > java.util.Pair. In equals, instanceof Pair should be instanceof Pair. Pair is a raw type. getA()/getB should be renamed to getFirst()/getSecond(), according to their javadoc. Object.toString() is not necessary in Pair.toString() because StringBuilder.append (in fact String.valueOf()) already returns "null" for null. And minor optimisation, ']' can be used instead of "]". public String toString() { return "[" +a + ", " + b + ']'; } > > -Joe R?mi From Weijun.Wang at Sun.COM Wed Mar 31 07:32:12 2010 From: Weijun.Wang at Sun.COM (Weijun Wang) Date: Wed, 31 Mar 2010 15:32:12 +0800 Subject: java.util.Pair In-Reply-To: <4BB28A9C.10705@oracle.com> References: <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com> <4BB28A9C.10705@oracle.com> Message-ID: <1CA21FA1-0EE9-4973-A0B7-E02C1E8F9861@Sun.COM> "implements Serializable"? -Max On Mar 31, 2010, at 7:34 AM, joe.darcy at Oracle.com wrote: > > > On 3/30/2010 10:54 AM, Kevin Bourrillion wrote: >> Pair is only a partial, flawed solution to a special case (n=2) of a very significant problem: the disproportionate complexity of creating value types in Java. I support addressing the underlying problem in Java 8, and not littering the API with dead-end solutions like Pair. > > While I have sympathy with that conclusion, there is the > side-effect of littering many APIs with the flotsam of lots of different > classes named "Pair." My inclination would be to produce one adequate > Pair class in the JDK to prevent the proliferation of yet more Pair classes in other code bases. > > I should know better than to take the bait, below is a first cut at > java.util.Pair. > > -Joe > > package java.util; > > import java.util.Objects; > > /** > * An immutable pair of values. The values may be null. The values > * themselves may be mutable. > * > * @param the type of the first element of the pair > * @param the type of the second element of the pair > * > * @since 1.7 > */ > public final class Pair { > private final A a; > private final B b; > > private Pair(A a, B b) { > this.a = a; > this.b = b; > } > > /** > * Returns a pair whose elements are the first and second > * arguments, respectively. > * @return a pair constructed from the arguments > */ > public static Pair valueOf(C c, D d) { > // Don't mandate new values. > return new Pair(c, d); > } > > /** > * Returns the value of the first element of the pair. > * @return the value of the first element of the pair > */ > public A getA() { > return a; > } > > /** > * Returns the value of the second element of the pair. > * @return the value of the second element of the pair > */ > public B getB() { > return b; > } > > /** > * TBD > */ > @Override > public String toString() { > return "[" + Objects.toString(a) + ", " + Objects.toString(b) + "]"; > } > > /** > * TBD > */ > @Override > public boolean equals(Object x) { > if (!(x instanceof Pair)) > return false; > else { > Pair that = (Pair) x; > return > Objects.equals(this.a, that.a) && > Objects.equals(this.b, that.b); > } > } > > /** > * TBD > */ > @Override > public int hashCode() { > return Objects.hash(a, b); > } > } From kevinb at google.com Wed Mar 31 15:34:58 2010 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 31 Mar 2010 08:34:58 -0700 Subject: java.util.Pair In-Reply-To: <4BB2FA4A.6090809@univ-mlv.fr> References: <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com> <4BB28A9C.10705@oracle.com> <4BB2FA4A.6090809@univ-mlv.fr> Message-ID: On Wed, Mar 31, 2010 at 12:31 AM, R?mi Forax wrote: In equals, instanceof Pair should be instanceof Pair. > Pair is a raw type. > Tangent: there are those of us who believe javac is quite mistaken to issue a warning on 'instanceof Pair'. (And even if it were right in theory (which I don't think it is), weren't warnings supposed to be things that would warn you about possible *bugs*?) -- Kevin Bourrillion @ Google internal: http://goto/javalibraries external: http://guava-libraries.googlecode.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From crazybob at crazybob.org Wed Mar 31 15:36:26 2010 From: crazybob at crazybob.org (Bob Lee) Date: Wed, 31 Mar 2010 08:36:26 -0700 Subject: java.util.Pair In-Reply-To: <4BB28A9C.10705@oracle.com> References: <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com> <4BB28A9C.10705@oracle.com> Message-ID: On Tue, Mar 30, 2010 at 4:34 PM, wrote: > While I have sympathy with that conclusion, there is the > side-effect of littering many APIs with the flotsam of lots of different > classes named "Pair." My inclination would be to produce one adequate > Pair class in the JDK to prevent the proliferation of yet more Pair classes > in other code bases. Please don't add Pair. It should never be used in APIs. Adding it to java.util will enable and even encourage its use in APIs. The damage done to future Java APIs will be far worse than a few duplicate copies of Pair (I don't even see that many). I think we'll have a hard time finding use cases to back up this addition. Bob -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Wed Mar 31 16:14:59 2010 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 31 Mar 2010 09:14:59 -0700 Subject: java.util.Pair In-Reply-To: References: <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com> <4BB28A9C.10705@oracle.com> Message-ID: On Wed, Mar 31, 2010 at 8:36 AM, Bob Lee wrote: Please don't add Pair. It should never be used in APIs. Adding it to > java.util will enable and even encourage its use in APIs. The damage done to > future Java APIs will be far worse than a few duplicate copies of Pair (I > don't even see that many). I think we'll have a hard time finding use cases > to back up this addition. > FYI, here are some examples of types you can look forward to seeing in Java code near you when you have a Pair class available: Pair,List>>> Map>>> Map>>> FJ.EmitFn>>>>> Processor>,Pair,List>>,List>> DoFn>>>,Pair>>>> These are all real examples found in real, live production code (simplified a little). There were only a scant few examples of this... caliber... that did not involve Pair. The problem is that classes like Pair simply go that much further to indulge the desire to never have to create any actual types of our own. When we're forced to create our own types, we begin to model our data more appropriately, which I believe leads us to create good abstractions at broader levels of granularity as well. -- Kevin Bourrillion @ Google internal: http://goto/javalibraries external: http://guava-libraries.googlecode.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mr at sun.com Wed Mar 31 16:25:35 2010 From: mr at sun.com (Mark Reinhold) Date: Wed, 31 Mar 2010 09:25:35 -0700 Subject: java.util.Pair In-Reply-To: kevinb@google.com; Wed, 31 Mar 2010 09:14:59 PDT; Message-ID: <20100331162535.88F56420@eggemoggin.niobe.net> > Date: Wed, 31 Mar 2010 09:14:59 -0700 > From: Kevin Bourrillion > ... > > The problem is that classes like Pair simply go that much further to indulge > the desire to never have to create any actual types of our own. When we're > forced to create our own types, we begin to model our data more appropriately, > which I believe leads us to create good abstractions at broader levels of > granularity as well. I agree. Java isn't Lisp. - Mark From jjb at google.com Wed Mar 31 16:40:55 2010 From: jjb at google.com (Joshua Bloch) Date: Wed, 31 Mar 2010 09:40:55 -0700 Subject: java.util.Pair In-Reply-To: <20100331162535.88F56420@eggemoggin.niobe.net> References: <20100331162535.88F56420@eggemoggin.niobe.net> Message-ID: Just to add my voice to the chorus, I think adding pair is seductive but ill-considered. Based on our experience at Google, I believe it makes a bad situation worse. I do believe that Kevin's idea is worth of exploration: in essence trying to encapsulate all of the knowledge in Chapter 3 of Effective Java into the language, so that creating a fully-functional value type is as simple as naming its fields and providing their types. Of course the devil is in the details, but this could be a very good thing. Josh On Wed, Mar 31, 2010 at 9:25 AM, Mark Reinhold wrote: > > Date: Wed, 31 Mar 2010 09:14:59 -0700 > > From: Kevin Bourrillion > > > ... > > > > The problem is that classes like Pair simply go that much further to > indulge > > the desire to never have to create any actual types of our own. When > we're > > forced to create our own types, we begin to model our data more > appropriately, > > which I believe leads us to create good abstractions at broader levels of > > granularity as well. > > I agree. Java isn't Lisp. > > - Mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Wed Mar 31 17:04:23 2010 From: forax at univ-mlv.fr (=?UTF-8?B?UsOpbWkgRm9yYXg=?=) Date: Wed, 31 Mar 2010 19:04:23 +0200 Subject: java.util.Pair In-Reply-To: References: <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com> <4BB28A9C.10705@oracle.com> <4BB2FA4A.6090809@univ-mlv.fr> Message-ID: <4BB38097.5090107@univ-mlv.fr> Le 31/03/2010 17:34, Kevin Bourrillion a ?crit : > On Wed, Mar 31, 2010 at 12:31 AM, R?mi Forax > wrote: > > In equals, instanceof Pair should be instanceof Pair. > Pair is a raw type. > > > Tangent: there are those of us who believe javac is quite mistaken to > issue a warning on 'instanceof Pair'. you're not the only one but I think you're wrong. > (And even if it were right in theory (which I don't think it is), > weren't warnings supposed to be things that would warn you about > possible /bugs/?) possible bug: the semantics of instanceof Foo and instanceof Foo is different if generics will be reified. Example: class Foo { } instanceof Foo and instanceof Foo are equivalent. Now suppose I change the definition of Foo to: class Foo { } I recompile the class Foo and forget to recompile that code: Foo foobar = new Foo(); foobar instanceof Foo is ok but foobar instanceof Foo must raised an IncompatibleClassChangeError. > > > -- > Kevin Bourrillion @ Google > internal: http://goto/javalibraries > external: http://guava-libraries.googlecode.com > R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Wed Mar 31 17:20:32 2010 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 31 Mar 2010 10:20:32 -0700 Subject: java.util.Pair In-Reply-To: <4BB38097.5090107@univ-mlv.fr> References: <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com> <4BB28A9C.10705@oracle.com> <4BB2FA4A.6090809@univ-mlv.fr> <4BB38097.5090107@univ-mlv.fr> Message-ID: On Wed, Mar 31, 2010 at 10:04 AM, R?mi Forax wrote: > (And even if it were right in theory (which I don't think it is), weren't > warnings supposed to be things that would warn you about possible *bugs*?) > > possible bug: > the semantics of instanceof Foo and instanceof Foo is different if > generics will be reified. > With all due respect, I rest my case. :-) (Meaning: since you chose such a hypothetical future situation as an illustration, it suggests that indeed no actual bugs are being prevented here in the real world.) We have to recognize the fact that it is no small amount of the world's Java code that would become broken if generics were ever reified. And, as well, that -- no, I won't go on about this, because it's now a tangent of a tangent. -- Kevin Bourrillion @ Google internal: http://goto/javalibraries external: http://guava-libraries.googlecode.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From neal at gafter.com Wed Mar 31 17:34:23 2010 From: neal at gafter.com (Neal Gafter) Date: Wed, 31 Mar 2010 10:34:23 -0700 Subject: java.util.Pair In-Reply-To: References: <108fcdeb1003301054v211788a8tdb28dcb24aa0d2e4@mail.gmail.com> <4BB28A9C.10705@oracle.com> <4BB2FA4A.6090809@univ-mlv.fr> <4BB38097.5090107@univ-mlv.fr> Message-ID: On Wed, Mar 31, 2010 at 10:20 AM, Kevin Bourrillion wrote: > With all due respect, I rest my case. :-) > > (Meaning: since you chose such a hypothetical future situation as an > illustration, it suggests that indeed no actual bugs are being prevented > here in the real world.) > > We have to recognize the fact that it is no small amount of the world's > Java code that would become broken if generics were ever reified. And, as > well, that -- no, I won't go on about this, because it's now a tangent of a > tangent. > That depends on how the reification is done. Reification as described in < http://gafter.blogspot.com/2006/11/reified-generics-for-java.html> would break no existing code. -------------- next part -------------- An HTML attachment was scrubbed... URL: From i30817 at gmail.com Wed Mar 31 19:57:38 2010 From: i30817 at gmail.com (Paulo Levi) Date: Wed, 31 Mar 2010 20:57:38 +0100 Subject: java.util.Pair Message-ID: Please don't add this. I have my own tuple parametric class. In fact it is easy to do. http://code.google.com/p/bookjar-utils/source/browse/BookJar-utils/src/util/Tuples.java However i never use it anymore. It is easy to do & use, but really stupid since the names (first, second, third...) are so generic... and it has no behavior. Invariably i have to replace it by a more domain appropriate class with real names & methods. A real tuple (where the names don't mater...) might be usable generally, but not a tuple like normal type. -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinrb at google.com Wed Mar 31 23:06:43 2010 From: martinrb at google.com (Martin Buchholz) Date: Wed, 31 Mar 2010 16:06:43 -0700 Subject: Wording improvements for String.indexOf, String.lastIndexOf Message-ID: Hi Alan, Xueming, I'd like you to do a code review. http://cr.openjdk.java.net/~martin/webrevs/openjdk7/lastIndexOf2/ A colleague suggested wording improvements for String.indexOf and String.lastIndexOf At least, this makes the javadoc less gratuitously inconsistent. Since I'm already coincidentally fixing lastIndexOf, I can either fold this patch into lastIndexOf or you can file a separate bug - your choice. Thanks, Martin From kevin.l.stern at gmail.com Wed Mar 31 23:36:13 2010 From: kevin.l.stern at gmail.com (Kevin L. Stern) Date: Wed, 31 Mar 2010 18:36:13 -0500 Subject: A List implementation backed by multiple small arrays rather than the traditional single large array. In-Reply-To: <1ccfd1c11003301520g564876fehfce57def62f6d6b3@mail.gmail.com> References: <1704b7a21003280455u784d4d2ape39a47e2367b79a8@mail.gmail.com> <1ccfd1c11003290023u5c59f926o8ceb79fe0d3bbc6f@mail.gmail.com> <1704b7a21003300425i7dd1ef7he28728ad3cdb60e2@mail.gmail.com> <1ccfd1c11003301520g564876fehfce57def62f6d6b3@mail.gmail.com> Message-ID: What am I missing here? In "Resizable arrays in optimal time and space" the authors define their data structure with the following property: (1) "When superblock SB_k is fully allocated, it consists of 2^(floor(k/2)) data blocks, each of size 2^(ceil(k/2))." Since the superblock is zero-based indexed this implies the following structure: SB_0: [1] SB_1: [2] SB_2: [2][2] SB_3: [4][4] SB_4: [4][4][4][4] [...] Let's have a look at Algorithm 3, Locate(i), with i = 3: r = 100 (the binary expansion of i + 1) k = |r| - 1 = 2 p = 2^k - 1 = 3 What concerns me is their statement that p represents "the number of data blocks in superblocks prior to SB_k." There are only two data blocks in superblocks prior to SB_2, not three. Given (1) above, unless I'm misinterpreting it, the number of data blocks in superblocks prior to SB_k should be: 2 * Sum[i=0->k/2-1] 2^i = 2 * (2^(k/2) - 1) This, of course, seems to work out much better in my example above, giving the correct answer to my interpretation of their data structure, but I have a hard time believing that this is their mistake rather than my misinterpretation. Thoughts? Kevin On Tue, Mar 30, 2010 at 5:20 PM, Martin Buchholz wrote: > On Tue, Mar 30, 2010 at 04:25, Kevin L. Stern > wrote: > > Hi Martin, > > > > Thanks much for your feedback. The first approach that comes to mind to > > implement O(1) time front as well as rear insertion is to create a cyclic > > list structure with a front/rear pointer - to insert at the front > requires > > decrementing the front pointer (modulo the size) and to insert at the > rear > > requires incrementing the rear pointer (modulo the size). We need to > resize > > when the two pointers bump into each other. Could you explain more about > > your suggestion of introducing an arraylet that is shared by the front > and > > the rear? > > It was a half-baked idea - I don't know if there's a way to turn it into > something useful. I was thinking of the ArrayDeque implementation, > where all the elements live in a single array. > > > It's not clear to me how that would help and/or be a better > > approach than the cyclic list. Anyhow, the paper that you reference, > > "Resizable arrays in optimal time and space", gives a deque so if we take > > that approach then the deque is specified. > > Technically, ArrayList also supports the Deque operations - > just not efficiently. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cadenza at paradise.net.nz Tue Mar 23 06:08:58 2010 From: cadenza at paradise.net.nz (Bruce Chapman & Barbara Carey) Date: Tue, 23 Mar 2010 19:08:58 +1300 Subject: Kinda ? In-Reply-To: <4BA7F97C.5070202@sun.com> References: <1ccfd1c11003201136u78e159ew88724bfa5a9e28c0@mail.gmail.com> <4BA9256A.2020602@sun.com> <4BA7F6EB.6040804@gmx.de> <4BA7F97C.5070202@sun.com> Message-ID: <4BA85AFA.70005@paradise.net.nz> Paul Hohensee wrote: > "in a way" plus "somewhat", as in "it's kinda bad" == "in a way, it's > somewhat bad". > > On 3/22/10 7:02 PM, Ulf Zibis wrote: >> Can somebody betray the sense of "Kinda" to me? >> >> -Ulf >> >> > a spoken contraction of "kind of" (similar meaning to sorta a contraction of sort-of) nothing to do with children (kinder) although you might sometimes see it spelt that way too. Bruce From tom.hawtin at oracle.com Tue Mar 30 11:15:37 2010 From: tom.hawtin at oracle.com (tom.hawtin at oracle.com) Date: Tue, 30 Mar 2010 12:15:37 +0100 Subject: java.util.Pair In-Reply-To: References: Message-ID: <4BB1DD59.6000007@oracle.com> On 30/03/2010 09:08, Weijun Wang wrote: > I know such a simple thing can be made very complex and everyone might > want to add a new method into it. How about we just make it most > primitive? Simply an immutable and Serializable class, two final > fields, one constructor, two getters (?), and no static factory > methods. Even with the diamond operator, I'd prefer a static creation method to a constructor. Immutable value classes really should not have constructors. I'd also like to support Comparable for Comparables. > (S)he who does the real implementation has the privilege to > choose between head/tail and car/cdr. Or are you suggesting an abstract base class to support two-field immutables? IMO, a good idea from a strong-typing perspective, but lazy programmers will probably want a concrete pair or they'll keep implementing their own. Tom