Reduce your way into good Clojure
Standard disclaimer: Clojure is a new hobby.
In my previous post I discussed replacing for loops with the (map) function. That works, until it doesn't.
Another step needed to generate the cosine similarity for two strings is to create a frequency histogram, or how many times each character pair occurs in a string. This is a pretty good fit for a hash map, where the keys are the character pairs, and the values the occurrences.
Here was my initial try. This code meant to do well... but went far, far away from where I wanted. Let's take a detailed look at my failings:
(let [hm (hash-map)]
(map
(fn [key val]
(assoc hm key val))
["a" "b" "a"]
[1 2 3]))
What I wanted: a map like this => {"b" 2, "a" 3}. What I got was three hash maps, each with one key/value pair => ({"a" 1} {"b" 2} {"a" 3}). The intent was to use map to iterate over the sequences, and use assoc to put the key and value into one hash map. And now, for the parade of errors...
- Use the zipmap function to build hashmaps like this:
(zipmap ["a" "b" "a"] [1 2 3])=>{"b" 2, "a" 3} - Variables in the let block cannot be changed...
- ... but I really needed to update the hash map defined in the let block.
Rather than continue to stew uselessly I used Emacs to hop into the #clojure IRC channel. A wonderful person suggested the reduce function; I'd used it once in Ruby where it is best known as inject. I'm going to skimp on my description of reduce a little since that article is so well done.
reduce iterates over a collection like map, but it passes a mutable context to each callback. The return value of reduce is the final value of the context... essentially. I am still coming up to speed on it obviously.
Anyhow, enough yammering. Here is reduce in action:
(reduce
(fn [product key]
(+ product
(* (get l_histo key 0)
(get r_histo key 0))))
0
(keys l_histo)))
Key points:
[l_histo r_histo]-- these are the arguments to the dot-product function.(fndefines an anonymous function.(getgets the value for the specified key from the specified map, returning the final argument when the key is absent.0is the default value forproduct(keysreturns the keys of the specified hash map- Clojure really wins a lot by delegating to Java. This happened for free:
(Math/sqrt 100)=>10.0
So, reduce is a critical step for people coming from imperative programming languages looking to do basic things with collections.


