RFR: 8314774: Optimize URLEncoder

Claes Redestad redestad at openjdk.org
Thu Aug 24 09:50:40 UTC 2023


On Wed, 23 Aug 2023 18:51:37 GMT, Daniel Fuchs <dfuchs at openjdk.org> wrote:

> The fast path that just returns the given string if ASCII-only and no encoding looks simple enough. I don't particularly like the idea of embedding the logic of encoding UTF-8 into that class though, that increases the complexity significantly, and Charset encoders are there for that. Also I don't understand the reason for changing BitSet into a boolean array - that seems gratuitous?

A perhaps key difference for performance between the `BitSet` and the `boolean[]` in this code is that the latter is `static final @Stable` and thus easy to optimize for the JIT. The `words` array held by a `BitSet` is neither `final` nor `@Stable` so the JIT likely needs to keep a few extra checks around every access.

An interesting experiment would be to instead model this as a `ConstantBitSet` with a `final @Stable` internal array. This could get most (or all?) of the benefit, keeping things at a higher abstraction level and allow for some reuse. Retaining the compactness of `BitSet`s is nice too, though that might not be very important for constant bit sets.

API would need to be worked out but something like add a public method `BitSet::asConstant` and hiding away the details might be a good starting point:

public BitSet asConstant() {
  return new ConstantBitSet(this);
}

private static class ConstantBitSet extends BitSet {
  private @Stable final long[] words;
  private ConstantBitSet(BitSet bitSet) {
    words = Arrays.copyOf(bitSet.words);
  }
  // override all BitSet methods, make mutating methods throw (IllegalStateException?) 
  // -- for a public API perhaps extract an interface
}

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15354#issuecomment-1691364488


More information about the net-dev mailing list