RFR: 8316681: Rewrite URLEncoder.encode to use small reusable buffers [v6]

温绍锦 duke at openjdk.org
Thu Oct 5 00:32:26 UTC 2023


On Fri, 22 Sep 2023 08:53:07 GMT, Claes Redestad <redestad at openjdk.org> wrote:

>> `URLEncoder` currently appends chars that needs encoding into a `java.io.CharArrayWriter`, converts that to a `String`, uses `String::getBytes` to get the encoded bytes and then appends these bytes in a escaped manner to the output stream. This is somewhat inefficient.
>> 
>> This PR replaces the `CharArrayWriter` with a reusable `CharBuffer` + `ByteBuffer` pair. This allows us to encode to the output `StringBuilder` in small chunks, with greatly reduced allocation as a result.
>> 
>> The exact size of the buffers is an open question, but generally it seems that a tiny buffer wins by virtue of allocating less, and that the per chunk overheads are relatively small.
>
> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update src/java.base/share/classes/java/net/URLEncoder.java
>   
>   Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com>

URLEncoder#DONT_NEED_ENCODING based on BitSet is actually a lookup table. Should we consider improving it in this way?


public class URLEncoder {
	static final long DONT_NEED_ENCODING_FLAGS_0;
	static final long DONT_NEED_ENCODING_FLAGS_1;

	static {
		long flag0 = 0;
	    flag0 |= 1L << ' '; // ASCII 32
	    flag0 |= 1L << '*'; // ASCII 42
	    flag0 |= 1L << '-'; // ASCII 25
	    flag0 |= 1L << '.'; // ASCII 46

	    // ASCII 48 - 57
	    for (int i = '0'; i <= '9'; ++i) {
	        flag0 |= 1L << i;
	    }
	    DONT_NEED_ENCODING_FLAGS_0 = flag0;

	    long flags1 = 0;
	    // ASCII 65 - 90
	    for (int i = 'A'; i <= 'Z'; ++i) {
	        flags1 |= 1L << (i - 64);
	    }
	    flags1 |= 1L << ('_' - 64); // ASCII 95
	    // ASCII 97 - 122
	    for (int i = 'a'; i <= 'z'; ++i) {
	        flags1 |= 1L << (i - 64);
	    }
	    DONT_NEED_ENCODING_FLAGS_1 = flags1;
	}

	private static boolean dontNeedEncoding(char c) {
		int prefix = c >> 6;
		if (prefix > 1) {
			return false;
		}
	        long flags = prefix == 0 ? DONT_NEED_ENCODING_FLAGS_0 : DONT_NEED_ENCODING_FLAGS_1;
        	return (flags & (1L << c)) != 0;
	}
}

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15865#issuecomment-1747842588


More information about the net-dev mailing list