RFR :7088419 : (L) Use x86 Hardware CRC32 Instruction with java.util.zip.CRC32 and java.util.zip.Adler32
David Holmes
david.holmes at oracle.com
Mon May 27 04:38:54 UTC 2013
Hi David,
On 24/05/2013 2:06 AM, David Chase wrote:
> So where do we stand on this?
> Can we call it a bug and eligible for inclusion?
> And are there other issues to deal with?
I think this form of the optimization is more amenable to inclusion.
> I would be happy to discuss exactly what is going on in the mysterious
> intrinsic-ful code, if that is an issue for anyone (and sooner is better than
> later, else I'll forget the exciting details).
I'm not going to attempt to understand any of the details of this. That
said I still have an issue with this code:
147 if (buf) {
148 /* Don't know for sure how big an unsigned long is,
therefore
149 copy one at a time. */
150 int i;
151 for (i = 0; i < 256; i++) buf[i] = (jint) (crc_table[i]);
152 (*env)->ReleasePrimitiveArrayCritical(env, b, buf, 0);
153 }
buf is an array of 32-bit values; crc_table is either 32-bit or 64-bit
depending on platform. So on 64-bit we are truncating all the values in
crc_table. So presumably these values all fit in 32-bits anyway?
Minor nit:
src/share/classes/java/util/zip/CRC32.java
+ import java.lang.reflect.Field;
I don't think this import is needed now.
---
test/java/util/zip/TimeChecksum.java
You added eg:
time(adler32, data, 2*iters, 16); // warmup short case, too
which is presumably to ensure both of the primary paths are hit during
the warm up. But it isn't obvious to me that the actual test will hit
the short case ??
Thanks,
David
> David
>
> On 2013-05-21, at 5:15 PM, David Chase <david.r.chase at oracle.com> wrote:
>
>>
>> http://cr.openjdk.java.net/~drchase/7088419/webrev.03/
>>
>> Newer, slimmer webrev. No fork/join, the related code is removed (except the
>> native init routine still returns a boolean, which is currently ignored, indicating if
>> it supports the faster crc32).
>>
>> What remains is:
>>
>> 1) no-JNI fast-path for short CRCs and Adlers
>> 2) reformulated faster-Adler for byte-at-a-time
>> 3) For 32/64 Linux/Solaris/MacOS x86 Sandy Bridge or better (with CLMUL and AVX),
>> fast code for CRCs.
>>
>> For now it uses assembly language, the C-with-intrinsics is included, and compiles
>> with current clang and gcc. It tickles a bug in Solaris Studio 12.3 which has been reported.
>> Optimization for Windows is disabled because the assembler syntax is too different
>>
>> The code has been tested by-hand across Linux (32), Solaris (32 and 64), MacOS (64)
>> and has also passed JPRT (which is to say, the failures were unrelated, hand examination
>> of the runs showed that the new CRCandAdler test was successful. Anyone checking my
>> most recent run will notice that I am not very good at specifying tests to run, but there is
>> an earlier run. I'm trying again, just to see if I can figure out how to make it behave.)
>>
>> There is a companion webrev on the hotspot side that sets the secret property
>> that is used and reset by this code.
>>
>> I hope this will be more easily regarded as a "bug fix" and less as an interface change.
>>
>> David
>>
>> On 2013-05-20, at 8:34 PM, David Holmes <david.holmes at oracle.com> wrote:
>>
>>> On 21/05/2013 3:45 AM, Alan Bateman wrote:
>>>> On 20/05/2013 14:49, David Chase wrote:
>>>>> Suppose I split this bug (i.e., file a new bug) into the
>>>>> Intel-acceleration part and the fork-join part.
>>>>> :
>>>>>
>>>> This make sense.
>>>
>>> I agree.
>>>
>>> I also don't have an issue with eliding the optimization on Windows for the time being.
>>>
>>> David
>>>
>>>> -Alan.
>>
>
More information about the core-libs-dev
mailing list