RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding
Roger Riggs
Roger.Riggs at oracle.com
Fri Sep 4 15:07:55 UTC 2020
Hi Corey,
The idea I had in mind is refactoring the fast path into the method you
call decodeBlock.
Base64: lines 751-768.
It leaves all the unknown/illegal character handling to the Java code.
And yes, it does not need to handle MIME, except to return on illegal
characters.
The patch is attached.
Regards, Roger
On 8/31/20 6:22 PM, Corey Ashford wrote:
> On 8/29/20 1:19 PM, Corey Ashford wrote:
>> Hi Roger,
>>
>> Thanks for your reply and thoughts! Comments interspersed below:
>>
>> On 8/28/20 10:54 AM, Roger Riggs wrote:
> ...
>>> Comparing with the way that the Base64 encoder was intrinsified, the
>>> method that is intrinsified should have a method body that does
>>> the same function, so it is interchangable. That likely will just
>>> shift
>>> the "fast path" code into the decodeBlock method.
>>> Keeping the symmetry between encoder and decoder will
>>> make it easier to maintain the code.
>>
>> Good point. I'll investigate what this looks like in terms of the
>> actual code, and will report back (perhaps in a new webrev).
>>
>
> Having looked at this again, I don't think it makes sense. One thing
> that differs significantly from the encodeBlock intrinsic is that the
> decodeBlock intrinsic only needs to process a prefix of the data, and
> so it can leave virtually any amount of data at the end of the src
> buffer unprocessed, where as with the encodeBlock intrinsic, if it
> exists, it must process the entire buffer.
>
> In the (common) case where the decodeBlock intrinsic returns not
> having processed everything, it still needs to call the Java code, and
> if that Java code is "replaced" by the intrinsic, it's inaccessible.
>
> Is there something I'm overlooking here? Basically I want the decode
> API to behave differently than the encode API, mostly to make the
> arch-specific intrinsic easier to implement. If that's not acceptable,
> then I need to rethink the API, and also figure out how to deal with
> the illegal character case. The latter could perhaps be done by
> throwing an exception from the intrinsic, or maybe by returning a
> negative length that specifies the index of the illegal src byte, and
> then have the Java code throw the exception).
>
> Regards,
>
> - Corey
>
-------------- next part --------------
diff --git a/src/java.base/share/classes/java/util/Base64.java b/src/java.base/share/classes/java/util/Base64.java
index 34b39b18a54..e2b3a686d70 100644
--- a/src/java.base/share/classes/java/util/Base64.java
+++ b/src/java.base/share/classes/java/util/Base64.java
@@ -741,6 +741,27 @@ public class Base64 {
return 3 * (int) ((len + 3L) / 4) - paddings;
}
+ // Fast path for full 4 byte -> 3 byte conversion w/o errors
+ private int decodeBlock(byte[] src, int sp, int sl, byte[] dst, boolean isURL) {
+ int[] base64 = isURL ? fromBase64URL : fromBase64;
+ int dp = 0;
+ int sl0 = sp + ((sl - sp) & ~0b11);
+ while (sp < sl0) {
+ int b1 = base64[src[sp++] & 0xff];
+ int b2 = base64[src[sp++] & 0xff];
+ int b3 = base64[src[sp++] & 0xff];
+ int b4 = base64[src[sp++] & 0xff];
+ if ((b1 | b2 | b3 | b4) < 0) { // non base64 byte
+ return dp;
+ }
+ int bits0 = b1 << 18 | b2 << 12 | b3 << 6 | b4;
+ dst[dp++] = (byte)(bits0 >> 16);
+ dst[dp++] = (byte)(bits0 >> 8);
+ dst[dp++] = (byte)(bits0);
+ }
+ return dp;
+ }
+
private int decode0(byte[] src, int sp, int sl, byte[] dst) {
int[] base64 = isURL ? fromBase64URL : fromBase64;
int dp = 0;
@@ -749,23 +770,34 @@ public class Base64 {
while (sp < sl) {
if (shiftto == 18 && sp + 4 < sl) { // fast path
- int sl0 = sp + ((sl - sp) & ~0b11);
- while (sp < sl0) {
- int b1 = base64[src[sp++] & 0xff];
- int b2 = base64[src[sp++] & 0xff];
- int b3 = base64[src[sp++] & 0xff];
- int b4 = base64[src[sp++] & 0xff];
- if ((b1 | b2 | b3 | b4) < 0) { // non base64 byte
- sp -= 4;
- break;
- }
- int bits0 = b1 << 18 | b2 << 12 | b3 << 6 | b4;
- dst[dp++] = (byte)(bits0 >> 16);
- dst[dp++] = (byte)(bits0 >> 8);
- dst[dp++] = (byte)(bits0);
- }
- if (sp >= sl)
- break;
+ int dl = decodeBlock(src, sp, sl, dst, isURL);
+ /*
+ * Calculate how many characters were processed by how many
+ * bytes of data were returned.
+ */
+
+ /*
+ * Base64 characters always come in groups of four,
+ * producing three bytes of binary data (except for on
+ * the final four-character piece where it can produce
+ * one to three data bytes depending on how many fill
+ * characters there - zero, one, or two). The only
+ * case where there should be a non-multiple of three
+ * returned is if the intrinsic has processed all of
+ * the characters passed to it. At this point in the
+ * logic, however, we know the instrinsic hasn't
+ * processed all of the chracters.
+ *
+ * Round dl down to the nearest three-byte boundary.
+ */
+ dl = (dl / 3) * 3;
+
+ // Recalculate chars_decoded based on the rounded dl
+ int chars_decoded = (dl / 3) * 4;
+
+ sp += chars_decoded;
+ dp += dl;
+ continue;
}
int b = src[sp++] & 0xff;
if ((b = base64[b]) < 0) {
More information about the core-libs-dev
mailing list