Home Previous year paper Algorithms Notes About us
Hamming distances

The Hamming distance hamming(a,b) between two strings a and b of equal length is the number of positions where the strings differ. For example,
hamming(01101,11001) = 2.

Consider the following problem: Given a list of n bit strings, each of length k, calculate the minimum Hamming distance between two strings in the list. For example, the answer for [00111,01101,11110] is 2, because
  • hamming(00111,01101) = 2,
  • hamming(00111,11110) = 3, and
  • hamming(01101,11110) = 3.

A straightforward way to solve the problem is to go through all pairs of strings and calculate their Hamming distances, which yields an O(n2k) time algorithm. The following function can be used to calculate distances:
int hamming(string a, string b) {
int d = 0;
for(int i = 0; i < k; i++) {
if(a[i] != b[i]) d++;
}
return d;
}
However, if k is small, we can optimize the code by storing the bit strings as integers and calculating the Hamming distances using bit operations. In particular, if k ≤ 32, we can just store the strings as int values and use the following function to calculate distances:
int hamming(int a, int b) {
return __builtin_popcount(a^b);
}
In the above function, the xor operation constructs a bit string that has one bits in positions where a and b differ. Then, the number of bits is calculated using the __builtin_popcount function.
To compare the implementations, we generated a list of 10000 random bit strings of length 30. Using the first approach, the search took 13.5 seconds, and after the bit optimization, it only took 0.5 seconds. Thus, the bit optimized code was almost 30 times faster than the original code.