ICFP 2008 post mortem
Well, team Foognostic slightly exceeded expectations, but I think only by luck. The goal was to write a fully functional client, and the hope was to make it navigate around a grid. The goal was accomplished -- the code installed, received commands, and sent commands exactly as specified. The hope was not realized... a few rushed hours produced some really, really wrong code for navigating.
Round one had no aliens, and the code actually found home base once. This was purely a coincidence of where the rover was started and the worse-than-random dain-bramaged movement code. The other three times the timer expired. So we barely squeaked by round one.
Round two went as it should have. Five trials -- three timeouts and two alien encounters. Not nearly good enough to advance.
I'd like to blame lack of time and to be fair, that weekend was uncommonly busy. But the real failure was not raising a team. Parts of the task could be dealt with in parallel... next time I'm going for the full court press -- so if you know me and like to code -- watch out ![]()
A whirlwind tour of Bloom filters
What's a Bloom filter?
Traits of a hash map key set
- After an item has been stored, its existance will be known. It will never go missing. There is no chance of saying an item has not been stored after it has been stored.
- After an item has been stored, it may be identified as a different item. This happens when the hash function generates the same value for distinct data. So there is a risk of falsely confirming item B has been stored after only storing item A.
- The space required to hold these keys will be the product of the key size and key count. While key size is probably fixed, key count is variable and will lead to more space needed as more items are added.
Introducing Bloom filters
A Bloom filter shares traits 1 and 2 of a hash map key set, but not trait 3. The space required by a Bloom filter does not increase as more items are stored. Rather than processing data with one hash function and using the result as a lookup value, a Bloom filter uses multiple hash functions to identify booleans in a fixed size array.Bloom filter traits
- Same as hash map key set trait 1
- Same as hash map key set trait 2
- Storage space is small, but depends on the desired false positive rate
- Lookup time depends on number of hash functions; can be run in parallel
- Items cannot be removed
Code
This tacky, easily improvable code was released to the public domain and promises nothing.class MediocreBloomFilter M = 16 # number of boolean fields in filter K = 3 # number of hash functions to find boolean fields attr_accessor :v # the filter, a vector of boolean fields def initialize() @v = Array.new(M, 0) end def track_item(item) indicies_for?(item).each do |i| @v[i] = 1 end end def item_tracked?(item) hits = 0 indicies_for?(item).each do |i| hits = hits + @v[i] end hits == K end def indicies_for?(item) indicies = [] md5 = Digest::MD5.new hash = "" md5.update(item.to_s) hash = md5.hexdigest K.times do |k| indicies << HEX.index(hash[k].chr.to_s.upcase) end indicies end HEX = [ "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F" ] end
- Line 7 sets the filter storage to 16 integers. It could have been two bytes if I wanted to do bitwise operations.
- Line 24 runs the hash functions against the target data. Since this is tacky sample code only one hash function is executed (line 30), and multiple slices are taken from the value and converted into array indicies (at line 33).
- Using integers rather than booleans has a nice side effect. It permits removal of items by using the integers as reference counts. I haven't tested it, but minor changes should be needed around lines 12 and 19, and a removal function to decrement the counts.
- item: "abc"
- md5: 900150983cd24fb0d6963f7d28e17f72
- indicies: [9, 0, 0]
- filter: [1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
- item: "abd"
- md5: 4911e516e5aa21d327512e0c8b197616
- indicies: [4, 9, 1]
- filter: [1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
- item: "seth schroeder"
- md5: 2c38f5fd13873c139567551ca1b7496e
- indicies: [2, 12, 3]
- filter: [1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0]
References
- Using Bloom Filters
- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
- Burton Bloom, Space/time trade-offs in hash coding with allowable errors, CACM, 13(7):422-426, July 1970.
- Paul E. Black, "Bloom filter", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed., U.S. National Institute of Standards and Technology. 16 May 2008. Available from: http://www.nist.gov/dads/HTML/bloomFilter.html


