RFR: 8314774: Optimize URLEncoder

Wed Aug 23 23:58:28 UTC 2023

On Wed, 23 Aug 2023 18:51:37 GMT, Daniel Fuchs <dfuchs at openjdk.org> wrote:

>  I don't particularly like the idea of embedding the logic of encoding UTF-8 into that class though, that increases the complexity significantly, and Charset encoders are there for that.

Unfortunately, the `CharsetEncoder` is too generic. Due to our knowledge of UTF-8, implementing it inline eliminates unnecessary temporary objects. There are already some places that do this, such as `String`.

I'm thinking we might be able to extract this logic into a static helper class.

public class UTF8EncodeUtils {
    public static boolean isSingleByte(char c) { return c < 0x80; }

    public static boolean isDoubleBytes(char c) { return c < 0x800; }

    public static int encodeDoubleBytes(char c) {
        byte b0 = (byte) (0xc0 | (c >> 6));
        byte b1 = (byte) (0x80 | (c & 0x3f));
        return ((b0 & 0xff) << 8) | b1;
    }

    public static int encodeThreeBytes(char c) {
        byte b0 = (byte) (0xe0 | c >> 12);
        byte b1 = (byte) (0x80 | c >> 6 & 0x3f);
        byte b2 = (byte) (0x80 | c & 0x3f);
        return ((b0 & 0xff) << 16) | ((b1 & 0xff) << 8) | b2;
    }

    public static int encodeCodePoint(int uc) {
        byte b0 = (byte) (0xf0 | ((uc >> 18)));
        byte b1 = (byte) (0x80 | ((uc >> 12) & 0x3f));
        byte b2 = (byte) (0x80 | ((uc >> 6) & 0x3f));
        byte b3 = (byte) (0x80 | (uc & 0x3f));
        return ((b0 & 0xff) << 24) | ((b1 & 0xff) << 16) | ((b2 & 0xff) << 8) | b3;
    }
}

We can use this helper class to reimplement `String` and the UTF-8 `CharsetEncoder` (after we make sure it has no overhead), then use it to implement more UTF-8 fast paths.

I've also been doing some work on `OutputStreamWriter` recently. By implementing a fast path for UTF-8, there are over 20x speedups in some cases. I think maybe we can get exciting improvements in more places.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15354#issuecomment-1690789474