Question: Fast bitwise operations on a long

Question

Fast bitwise operations on a long

Answers 3
Added at 2017-01-04 19:01
Tags
Question

I know I can write the following method to calculate the set bit indexes of a long :

private static List<Integer> bitPositions(long number) {
    final List<Integer> positions = new ArrayList<>();
    int position = 1;
    while (number != 0) {
        if ((number & 1L) != 0) {
            positions.add(position);
        }
        position++;
        number = number >>> 1;
    }
    return positions;
}

My question is : Is there a faster method to do this ?

Answers to

Fast bitwise operations on a long

nr: #1 dodano: 2017-01-04 20:01

Another answer

Note: BitBank's answer to this question is about twice as fast as the following two methods.

Improved method

Here's a method that produces the same results (just in a byte[] instead of a List<Integer>) about twice as fast:

private static final byte[] bitPositions(long n) {
    final byte[] result = new byte[Long.bitCount(n)];

    int i = 0;
    for (byte bit = 1; n != 0L; bit++) {
        if ((n & 1L) != 0) result[i++] = bit;
        n >>>= 1;
    }

    return result;
}

I'd recommend changing byte bit = 1 in the for loop to byte bit = 0 to switch to the traditional method of numbering bit positions starting with zero instead of one.

Improvements

  • Precomputing the needed capacity with Long.bitCount(n) (uses the very speedy "popcount" instruction of your processor) speeds up your method quite a bit. You can change this by making the ArrayList using new ArrayList<>(Long.bitCount(n)).
  • Using an ArrayList<Integer> is slower than a byte[], because:
    • Time must be wasted looking up low-valued (-127 to 128) Integer values from the Integer cache to put them into the ArrayList.
    • Time must be wasted when using the ints stored in the resulting List<Integer> later on because you have to both retrieve the Integer from the List<Integer> and then retrieve the int from the Integer.
  • A byte[] uses about 1/4th (32-bit system) or 1/8th (64-bit system) the memory of an ArrayList<Integer>, since bytes are that much smaller than pointers to Integers.

Slightly faster, but uglier method

As suggested by another person's deleted answer, loop unrolling speeds things up a little bit further on my machine (check whether that's true on your machine as well):

private static final byte[] bitPositions(final long n) {
    final byte[] result = new byte[Long.bitCount(n)];

    int i = 0;
    if ((n &                    1L) != 0L) result[i++] = 1;
    if ((n &                    2L) != 0L) result[i++] = 2;
    if ((n &                    4L) != 0L) result[i++] = 3;
    if ((n &                    8L) != 0L) result[i++] = 4;
    if ((n &                   16L) != 0L) result[i++] = 5;
    if ((n &                   32L) != 0L) result[i++] = 6;
    if ((n &                   64L) != 0L) result[i++] = 7;
    if ((n &                  128L) != 0L) result[i++] = 8;
    if ((n &                  256L) != 0L) result[i++] = 9;
    if ((n &                  512L) != 0L) result[i++] = 10;
    if ((n &                 1024L) != 0L) result[i++] = 11;
    if ((n &                 2048L) != 0L) result[i++] = 12;
    if ((n &                 4096L) != 0L) result[i++] = 13;
    if ((n &                 8192L) != 0L) result[i++] = 14;
    if ((n &                16384L) != 0L) result[i++] = 15;
    if ((n &                32768L) != 0L) result[i++] = 16;
    if ((n &                65536L) != 0L) result[i++] = 17;
    if ((n &               131072L) != 0L) result[i++] = 18;
    if ((n &               262144L) != 0L) result[i++] = 19;
    if ((n &               524288L) != 0L) result[i++] = 20;
    if ((n &              1048576L) != 0L) result[i++] = 21;
    if ((n &              2097152L) != 0L) result[i++] = 22;
    if ((n &              4194304L) != 0L) result[i++] = 23;
    if ((n &              8388608L) != 0L) result[i++] = 24;
    if ((n &             16777216L) != 0L) result[i++] = 25;
    if ((n &             33554432L) != 0L) result[i++] = 26;
    if ((n &             67108864L) != 0L) result[i++] = 27;
    if ((n &            134217728L) != 0L) result[i++] = 28;
    if ((n &            268435456L) != 0L) result[i++] = 29;
    if ((n &            536870912L) != 0L) result[i++] = 30;
    if ((n &           1073741824L) != 0L) result[i++] = 31;
    if ((n &           2147483648L) != 0L) result[i++] = 32;
    if ((n &           4294967296L) != 0L) result[i++] = 33;
    if ((n &           8589934592L) != 0L) result[i++] = 34;
    if ((n &          17179869184L) != 0L) result[i++] = 35;
    if ((n &          34359738368L) != 0L) result[i++] = 36;
    if ((n &          68719476736L) != 0L) result[i++] = 37;
    if ((n &         137438953472L) != 0L) result[i++] = 38;
    if ((n &         274877906944L) != 0L) result[i++] = 39;
    if ((n &         549755813888L) != 0L) result[i++] = 40;
    if ((n &        1099511627776L) != 0L) result[i++] = 41;
    if ((n &        2199023255552L) != 0L) result[i++] = 42;
    if ((n &        4398046511104L) != 0L) result[i++] = 43;
    if ((n &        8796093022208L) != 0L) result[i++] = 44;
    if ((n &       17592186044416L) != 0L) result[i++] = 45;
    if ((n &       35184372088832L) != 0L) result[i++] = 46;
    if ((n &       70368744177664L) != 0L) result[i++] = 47;
    if ((n &      140737488355328L) != 0L) result[i++] = 48;
    if ((n &      281474976710656L) != 0L) result[i++] = 49;
    if ((n &      562949953421312L) != 0L) result[i++] = 50;
    if ((n &     1125899906842624L) != 0L) result[i++] = 51;
    if ((n &     2251799813685248L) != 0L) result[i++] = 52;
    if ((n &     4503599627370496L) != 0L) result[i++] = 53;
    if ((n &     9007199254740992L) != 0L) result[i++] = 54;
    if ((n &    18014398509481984L) != 0L) result[i++] = 55;
    if ((n &    36028797018963968L) != 0L) result[i++] = 56;
    if ((n &    72057594037927936L) != 0L) result[i++] = 57;
    if ((n &   144115188075855872L) != 0L) result[i++] = 58;
    if ((n &   288230376151711744L) != 0L) result[i++] = 59;
    if ((n &   576460752303423488L) != 0L) result[i++] = 60;
    if ((n &  1152921504606846976L) != 0L) result[i++] = 61;
    if ((n &  2305843009213693952L) != 0L) result[i++] = 62;
    if ((n &  4611686018427387904L) != 0L) result[i++] = 63;
    if ((n & -9223372036854775808L) != 0L) result[i++] = 64;

    return result;
}

You can also change this to start counting the bit positions at zero instead of one.

Improvements

  • Avoids the need to repeatedly perform >>> on the number.
  • Avoids the need to repeatedly perform ++ on the bit positon.
  • Avoids needing to check whether the number has reached zero.
  • Avoids branch mispredictions.
nr: #2 dodano: 2017-01-04 22:01

Here I used a simple approach, sublinear time. You decide if you count from left or from right, in printf("%ld\n", 32-pos) you can consider 32-pos or pos.

bit_pos(unsigned long x)
{
    unsigned long pos;
    unsigned long w;
    while(x) {
        /* extract the rightmost `1` in `w` then remove it */
        w = x&-x;
        x-=w;

        if (w) {
            /* compute number of trailing zeros (location of 1) for w */
            pos = 1;
            if (!(w >> 16)) {pos+=16;w<<=16;}
            if (!(w >> 24)) {pos+= 8;w<<= 8;}
            if (!(w >> 28)) {pos+= 4;w<<= 4;}
            if (!(w >> 30)) {pos+= 2;w<<= 2;}
            pos = pos - (w >> 31);
            printf("%ld\n", 32-pos);
        }
    }
    printf("\n");
}

main()
{
    bit_pos(2UL+4+32+1024);
    bit_pos(3456789UL);
}
nr: #3 dodano: 2017-01-05 16:01

If you don't mind using intrinsics, you can have an even faster version. Long.numberOfTrailingZeros() will use the CPU intrinsic that counts the number of consecutive zero bits starting from the least-significant bit (the BSF instruction on x86 processors).

For a sparse value, this will be faster than all other looping methods because it doesn't have any conditionals or branches within the main loop, it skips runs of any number of 0's with a single iteration, and, for a 64-bit long, the BSF intrinsic has a latency of only 3 clock cycles on Intel Haswell CPUs.

private static final byte[] bitPositions(long n) {
    final byte[] result = new byte[Long.bitCount(n)];

    byte bitPosition = 0;
    for (int i = 0; n != 0L; i++) {
        final byte bitsToSkip = (byte) (Long.numberOfTrailingZeros(n) + 1);
        n >>>= bitsToSkip;
        bitPosition += bitsToSkip;
        result[i] = bitPosition;
    }

    return result;
}
Source Show
◀ Wstecz