RFR: 8170785: Excessive allocation in ParseUtil.encodePath

Claes Redestad claes.redestad at oracle.com
Tue Jan 3 22:52:55 UTC 2017


Hi Roger,

On 2017-01-03 22:22, Roger Riggs wrote:
> Hi Claes,
>
> So Windows didn't suffer from having the '\' separator.

relatively speaking Windows won't see any speed-up when dealing
with file system paths, but most places where we see encodePath in
profiles is actually not files: jar/runtime image paths, HTTP etc...

Thus Windows shouldn't be left too far behind for typical workloads. :-)

>
> ParseUtil:
>
> firstEncodeIndex:121:
>   'a' - 'z' seems more frequent than '/' or '.'; does it improve the
> stats to move that range to the beginning of the if.
>   (yes the compiler can re-order).
>
> line 125:
>     Since 127 is known to need encoding it could be >= 0x007f
>
> line  136: I suppose the arraycopy intrinsic already optimizes length == 0;
>
> Line 134:  I question the math on  * 2 + 16 -index;  (But this is
> pre-existing code)
>    if there were lots of characters that needed encoding it might be
> possible to overflow the array since 1 char is replaced by at least 3
> and up to 9.
>    16 seems like a questionable fudge factor; but perhaps it has not
> been a problem in practice.

As the main objective here is to get rid of the allocations while not
regressing throughput when they can't be avoided I resisted *most*
urges to micro-optimize. :-)

The weird heuristic with len * 2 + 16 is ugly, yes, but I prefer to
leave it mostly alone and perhaps revisit this for a throughput
performance enhancement in the future.

'/' can actually be rather frequent in paths, and while '.' is likely
less common and should have been inserted after a-z, adding the check
improved performance of encoding to-be-rather-common paths like
"/java.base/bla/bla/bla/" remarkably while not visibly regressing
anything else.

What is clear from micros is that having to load and call into the
BitSet has a relatively large overhead compared to an extra comparison
(which is likely why there are separate checks for a-z, A-Z and 0-9 in
the first place).

I should actually see if we can't remove the BitSet entirely: I think
the java.net.URI approach with final long masks and a simple shift and
check might be more efficient and relatively easy to duplicate here.

Thanks!

/Claes

>
> $.02, Roger
>
>
>
> On 1/3/2017 9:46 AM, Claes Redestad wrote:
>> Hi,
>>
>> some users reports high allocation rates in ParseUtil.encodePath,
>> regardless of whether paths actually need to be encoded or not.
>> Since this is a commonly used utility it makes sense.
>>
>> Webrev: http://cr.openjdk.java.net/~redestad/8170785/webrev.01
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8170785
>>
>> This patch provides a semantically neutral fast-path for cases when
>> the path does not need to be encoded (up to 5x speedup), reduces
>> allocation when the string has a prefix that does not need to be
>> encoded (1-2x speedup) and no regression when using a separator
>> that's not '/' or the first char needs encoding.
>>
>> Interpreted performance is not affected much either: small positive
>> when no encoding is needed, neutral or negligible regression
>> otherwise.
>>
>> Thanks!
>>
>> /Claes
>


More information about the core-libs-dev mailing list