Long.bitCount micro-optimization
Isaac Levy
isaac.r.levy at gmail.com
Tue May 9 00:06:33 UTC 2017
AMD started shipping popcnt instruction in 2007, and intel in 2008.
Does the OpenJDK runtime include a bitCount intrinsic? I can't speak
much to relevance in this regard.
But in terms of performance,
http://dalkescientific.com/writings/diary/archive/2011/11/02/faster_popcount_update.html
gives a 45% speedup from current jdk algorithm (called "Wikipedia #2")
to the propsed one ("Wikipedia #3"). Benchmark source:
http://dalkescientific.com/writings/diary/popcnt.cpp
The multiply is also the suggested algorithm in the linked PDF (page
195 of http://support.amd.com/techdocs/25112.pdf), an official AMD
optimization guide.
Isaac
On Mon, May 8, 2017 at 4:29 PM, Doug Lea <dl at cs.oswego.edu> wrote:
> On 05/08/2017 02:14 PM, Isaac Levy wrote:
>>
>> Original message below:
>>
>> The JDK impl of bitCount can be improved -- though most users will get
>> the hotspot intrinsic. The best source I could find for the suggestion
>> below is page 195: http://support.amd.com/techdocs/25112.pdf
>>
>
> The int version differs from current implementation in that it uses
> one multiply instead of two shift+adds. (Similarly for long.)
>
> I wonder if there is any processor that does not already have a
> bit-count instruction for which this is actually faster?
> Some evidence either way would be helpful.
>
> -Doug
>
>
>
>
>> Cheers,
>> Isaac Levy
>>
>>
>> Proposed Long.bitCount and Integer.bitCount:
>>
>>
>> public static int bitCount(long i)
>> {
>> i -= (i >>> 1) & 0x5555555555555555L;
>> i = (i & 0x3333333333333333L) + ((i >>> 2) & 0x3333333333333333L);
>> i = (i + (i >>> 4)) & 0x0f0f0f0f0f0f0f0fL;
>> return (i * 0x0101010101010101L) >>> 56;
>> }
>>
>>
>> public static int bitCount(int i) {
>> i -= (i >>> 1) & 0x55555555;
>> i = (i & 0x33333333) + ((i >>> 2) & 0x33333333);
>> i = (i + (i >>> 4)) & 0x0f0f0f0f;
>> return (i * 0x01010101) >>> 24;
>> }
>>
>
More information about the core-libs-dev
mailing list