8219196: DataOutputStream.writeUTF may throw unexpected exceptions
Roger Riggs
Roger.Riggs at oracle.com
Mon Mar 18 18:35:46 UTC 2019
Hi,
Collapsing a couple of emails.
On 03/18/2019 01:43 PM, Martin Buchholz wrote:
> Hmmmm ....
>
> I took another look at this and Roger tickled my UTF-8 optimization
> spidey sense. Rewritten using Martin style:
>
> static int writeUTF(String str, DataOutput out) throws IOException {
> int strlen = str.length();
> int utflen = strlen; // optimized for ASCII
>
> for (int i = 0; i < strlen; i++) {
> int c = str.charAt(i);
> if (c >= 0x80 || c == 0)
> utflen += (c >= 0x800) ? 2 : 1;
> }
@Martin, you'd need uftlen to be a long to avoid the original overflow
to negative problem
for an input string length > MAXINT / 3. And that might be the
cleanest. Compared to adding
extra conditions in the loop.
Otherwise, this code looks good, it might optimize to fewer branches
than the original.
>
> if (utflen > 65535)
> throw new UTFDataFormatException(
> "encoded string too long: " + utflen + " bytes");
>
> final byte[] bytearr;
> if (out instanceof DataOutputStream) {
> DataOutputStream dos = (DataOutputStream)out;
> if (dos.bytearr == null || (dos.bytearr.length < (utflen +
> 2)))
> dos.bytearr = new byte[(utflen*2) + 2];
> bytearr = dos.bytearr;
> } else {
> bytearr = new byte[utflen + 2];
> }
>
> int count = 0;
> bytearr[count++] = (byte) ((utflen >>> 8) & 0xFF);
> bytearr[count++] = (byte) ((utflen >>> 0) & 0xFF);
>
> int i = 0;
> for (i = 0; i < strlen; i++) { // optimized for initial run of
> ASCII
> int c = str.charAt(i);
> if (c >= 0x80 || c == 0) break;
> bytearr[count++] = (byte) c;
> }
>
> for (; i < strlen; i++) {
> int c = str.charAt(i);
> if (c < 0x80 && c != 0) {
> bytearr[count++] = (byte) c;
> }
> else if (c >= 0x800) {
> bytearr[count++] = (byte) (0xE0 | ((c >> 12) & 0x0F));
> bytearr[count++] = (byte) (0x80 | ((c >> 6) & 0x3F));
> bytearr[count++] = (byte) (0x80 | ((c >> 0) & 0x3F));
> } else {
> bytearr[count++] = (byte) (0xC0 | ((c >> 6) & 0x1F));
> bytearr[count++] = (byte) (0x80 | ((c >> 0) & 0x3F));
> }
> }
> out.write(bytearr, 0, utflen + 2);
> return utflen + 2;
> }
>
>
> On Mon, Mar 18, 2019 at 6:41 AM Roger Riggs <Roger.Riggs at oracle.com
> <mailto:Roger.Riggs at oracle.com>> wrote:
>
> Hi Brian,
>
> On 03/15/2019 05:06 PM, Brian Burkhalter wrote:
>> Updated: http://cr.openjdk.java.net/~bpb/8219196/webrev.01/
>> <http://cr.openjdk.java.net/%7Ebpb/8219196/webrev.01/>
>
> I'm at odds with Martin on including utflen in the loop. (Or
> that's not what he meant).
> Now it has to do two comparisons for every input byte instead of 1.
>
> There is no need to have an early exit for input strings that are
> too long.
> It is a very unusual case, probably never except in testing.
>
> There might be a case for checking that strlen <= 65535,
> (even at 1 input byte per output byte it would be too large).
> But even that is probably not worth the compare and branch.
>
The test only needs enough memory to create the input string (MAXINT/3+1)
and a bit more for the default sized ByteArrayOutputStream.
So it should run in 2G configurations.
So yes, removing the enabled = false is ok.
Roger
>
>>
>>>> Instead of disabling the test statically, you could make it
>>>> conditional
>>>> on Runtime.maxMemory but the test will fail quickly anyway.
>>
>> For this I simply added the requirement to the jtreg tags instead.
> With the fix, it will throw immediately before any allocation, so
> there is
> no need for a restriction on memory size.
> The only time that would be necessary would be running the test on
> a runtime without the fix.
>
> The @requires >4g restriction is unnecessary.
>
> $.02, Roger
>
>
More information about the core-libs-dev
mailing list