RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding

Fri Sep 4 15:07:55 UTC 2020

Hi Corey,

The idea I had in mind is refactoring the fast path into the method you 
call decodeBlock.
Base64: lines 751-768.

It leaves all the unknown/illegal character handling to the Java code.
And yes, it does not need to handle MIME, except to return on illegal 
characters.

The patch is attached.

Regards, Roger



On 8/31/20 6:22 PM, Corey Ashford wrote:
> On 8/29/20 1:19 PM, Corey Ashford wrote:
>> Hi Roger,
>>
>> Thanks for your reply and thoughts!  Comments interspersed below:
>>
>> On 8/28/20 10:54 AM, Roger Riggs wrote:
> ...
>>> Comparing with the way that the Base64 encoder was intrinsified, the
>>> method that is intrinsified should have a method body that does
>>> the same function, so it is interchangable.  That likely will just 
>>> shift
>>> the "fast path" code into the decodeBlock method.
>>> Keeping the symmetry between encoder and decoder will
>>> make it easier to maintain the code.
>>
>> Good point.  I'll investigate what this looks like in terms of the 
>> actual code, and will report back (perhaps in a new webrev).
>>
>
> Having looked at this again, I don't think it makes sense.  One thing 
> that differs significantly from the encodeBlock intrinsic is that the 
> decodeBlock intrinsic only needs to process a prefix of the data, and 
> so it can leave virtually any amount of data at the end of the src 
> buffer unprocessed, where as with the encodeBlock intrinsic, if it 
> exists, it must process the entire buffer.
>
> In the (common) case where the decodeBlock intrinsic returns not 
> having processed everything, it still needs to call the Java code, and 
> if that Java code is "replaced" by the intrinsic, it's inaccessible.
>
> Is there something I'm overlooking here?  Basically I want the decode 
> API to behave differently than the encode API, mostly to make the 
> arch-specific intrinsic easier to implement. If that's not acceptable, 
> then I need to rethink the API, and also figure out how to deal with 
> the illegal character case.  The latter could perhaps be done by 
> throwing an exception from the intrinsic, or maybe by returning a 
> negative length that specifies the index of the illegal src byte, and 
> then have the Java code throw the exception).
>
> Regards,
>
> - Corey
>

-------------- next part --------------

diff --git a/src/java.base/share/classes/java/util/Base64.java b/src/java.base/share/classes/java/util/Base64.java
index 34b39b18a54..e2b3a686d70 100644
--- a/src/java.base/share/classes/java/util/Base64.java
+++ b/src/java.base/share/classes/java/util/Base64.java
@@ -741,6 +741,27 @@ public class Base64 {
             return 3 * (int) ((len + 3L) / 4) - paddings;
         }
 
+        // Fast path for full 4 byte -> 3 byte conversion w/o errors
+        private int decodeBlock(byte[] src, int sp, int sl, byte[] dst, boolean isURL) {
+            int[] base64 = isURL ? fromBase64URL : fromBase64;
+            int dp = 0;
+            int sl0 = sp + ((sl - sp) & ~0b11);
+            while (sp < sl0) {
+                int b1 = base64[src[sp++] & 0xff];
+                int b2 = base64[src[sp++] & 0xff];
+                int b3 = base64[src[sp++] & 0xff];
+                int b4 = base64[src[sp++] & 0xff];
+                if ((b1 | b2 | b3 | b4) < 0) {    // non base64 byte
+                    return dp;
+                }
+                int bits0 = b1 << 18 | b2 << 12 | b3 << 6 | b4;
+                dst[dp++] = (byte)(bits0 >> 16);
+                dst[dp++] = (byte)(bits0 >>  8);
+                dst[dp++] = (byte)(bits0);
+            }
+            return dp;
+        }
+
         private int decode0(byte[] src, int sp, int sl, byte[] dst) {
             int[] base64 = isURL ? fromBase64URL : fromBase64;
             int dp = 0;
@@ -749,23 +770,34 @@ public class Base64 {
 
             while (sp < sl) {
                 if (shiftto == 18 && sp + 4 < sl) {       // fast path
-                    int sl0 = sp + ((sl - sp) & ~0b11);
-                    while (sp < sl0) {
-                        int b1 = base64[src[sp++] & 0xff];
-                        int b2 = base64[src[sp++] & 0xff];
-                        int b3 = base64[src[sp++] & 0xff];
-                        int b4 = base64[src[sp++] & 0xff];
-                        if ((b1 | b2 | b3 | b4) < 0) {    // non base64 byte
-                            sp -= 4;
-                            break;
-                        }
-                        int bits0 = b1 << 18 | b2 << 12 | b3 << 6 | b4;
-                        dst[dp++] = (byte)(bits0 >> 16);
-                        dst[dp++] = (byte)(bits0 >>  8);
-                        dst[dp++] = (byte)(bits0);
-                    }
-                    if (sp >= sl)
-                        break;
+                    int dl = decodeBlock(src, sp, sl, dst, isURL);
+                    /*
+                     * Calculate how many characters were processed by how many
+                     * bytes of data were returned.
+                     */
+
+                    /*
+                     * Base64 characters always come in groups of four,
+                     * producing three bytes of binary data (except for on
+                     * the final four-character piece where it can produce
+                     * one to three data bytes depending on how many fill
+                     * characters there - zero, one, or two).  The only
+                     * case where there should be a non-multiple of three
+                     * returned is if the intrinsic has processed all of
+                     * the characters passed to it.  At this point in the
+                     * logic, however, we know the instrinsic hasn't
+                     * processed all of the chracters.
+                     *
+                     * Round dl down to the nearest three-byte boundary.
+                     */
+                    dl = (dl / 3) * 3;
+
+                    // Recalculate chars_decoded based on the rounded dl
+                    int chars_decoded = (dl / 3) * 4;
+
+                    sp += chars_decoded;
+                    dp += dl;
+                    continue;
                 }
                 int b = src[sp++] & 0xff;
                 if ((b = base64[b]) < 0) {