I wanted a macro, but needed recursion

It’s easier for me to start things than complete them. That seemed like bad news when I just couldn’t figure out how to generalize a small bit of Clojure code. One long week went by as I repeatedly got very close but couldn’t quite solve the problem. While eventually I figured it out, the process of getting there made me look forward to writing more Clojure.

My last exploration of Clojure taught me to avoid for loops while splitting a string into overlapping character pairs. This exploration into Clojure was going to generalize that code to emit any width of overlapping character chunks.

Here is the old, inflexible code:

(defn split-bigrams [text]
  (map
    (fn [l r] (str l r))
    (seq text)
    (rest (seq text))))

A quick summary of that:

  • Defines a function named split-bigrams
  • It accepts one untyped parameter, here named text
  • The method body contains one function call to map
  • map takes three arguments
    1. One anonymous function taking two arguments
    2. Two sequences
  • map pulls one character from each sequence and passes them to the anonymous function, which turns them into a string

The trick? Two iterators over one string, with the second iterator starting at the second character. Then just iterate over the lists and mash the items together. Problem is, to get character triplets I would need a third iterator starting at the third character. That’s no way to live! Just a big fat FALE WHALE.

I desperately hoped that macros could somehow magically generate all the lists for map. I like macros. Clojure likes macros! My poor little train of thought couldn’t scale this mountain. I thought I could, I thought I could, but between tucking in kids and dog walks I was too tired/busy. So I gave up and looked for another option.

Howzabout recursion? I spent several days rewriting variants of this:

(defn take-chunk [n seq]
  (when (nthnext seq (- n 1))
    (take n seq)))
(defn nice-try [n str]
  (when-let [chars (take-chunk n str)]
    (prn chars)
    (recur n (rest str))))

This prints the values I want… it just doesn’t put them into a collection! I feebly tried incorporating map and reduce. Not sure how many infinite loops were created and cancelled…

Finally, I decided to read the source code of Clojure to see how they do things like this. That was a great decision. core.clj is clean, clear, and full of goodness. Somewhere in there I found recur being used how I needed it: do something to a collection, call itself with the output until the input was exhausted, and return the aggregated output:

(defn ngrams [n chars ngrams]
  (let [ngram (take-chunk n chars)]
    (if ngram
      (recur n (rest chars) (cons ngram ngrams))
      ngrams)))

This is probably a basic recursion pattern. I feel a little silly taking so long to figure it out. Just a little silly though, as the coding I do for a living doesn’t encourage recursion.

At this point I am almost convinced that Clojure is the optimal language for me to grow as a programmer. Lisp itself is a low-level high-level language; you have to really understand how things are going to work, but not worry about things like garbage collection. The “homegrown VM” and “no library” problems just don’t exist in Clojure. The APIs are clean and clear. Best of all the dogma is minimal and pragmatic. HUZZAH!

 

OBE

Overcome By Events — this is my excuse for NOT submitting an entry for the ICFP contest this year. It’s Sunday, 0900 EDT as I write this and not a single line of code has been written. On top of that, today is going to be busy. Maybe next year.

 

ICFP 2009 is off and running!

Well, the ICFP 2009 programming challenge started today. Team Foognostic has signed up (so far includes: me) and has skimmed through the problem description. It’s a little premature to judge it, but the task kind of feels like a MMIXed up version of last year’s contest.

Unless something awesome happens the physics portion of the challenge will be tackled last, if at all. And I still intend to use Clojure.. I have a little ant based environment to assist.

If anyone is interested in joining team Foognostic, shoot me a mail at tbbs@sbbtabfgvp.arg, unless rot13 ain’t your thing. Don’t wait though, the contest ends June 29th at 14:00 CDT!

 

Reduce your way into good Clojure

In my previous post I discussed replacing for loops with the (map) function. That works, until it doesn’t.

Another step needed to generate the cosine similarity for two strings is to create a frequency histogram, or how many times each character pair occurs in a string. This is a pretty good fit for a hash map, where the keys are the character pairs, and the values the occurrences.

Here was my initial try. This code meant to do well… but went far, far away from where I wanted. Let’s take a detailed look at my failings:

user=> 
(let [hm (hash-map)]
  (map
    (fn [key val]
      (assoc hm key val))
      ["a" "b" "a"]
      [1 2 3]))

What I wanted: a map like this => {"b" 2, "a" 3}. What I got was three hash maps, each with one key/value pair => ({"a" 1} {"b" 2} {"a" 3}). The intent was to use map to iterate over the sequences, and use assoc to put the key and value into one hash map. And now, for the parade of errors…

  1. Use the zipmap function to build hashmaps like this: (zipmap ["a" "b" "a"] [1 2 3]) => {"b" 2, "a" 3}
  2. Variables in the let block cannot be changed…
  3. … but I really needed to update the hash map defined in the let block.

Rather than continue to stew uselessly I used Emacs to hop into the #clojure IRC channel. A wonderful person suggested the reduce function; I’d used it once in Ruby where it is best known as inject. I’m going to skimp on my description of reduce a little since that article is so well done.

reduce iterates over a collection like map, but it passes a mutable context to each callback. The return value of reduce is the final value of the context… essentially. I am still coming up to speed on it obviously.

Anyhow, enough yammering. Here is reduce in action:

(defn dot-product [l_histo r_histo]
  (reduce
   (fn [product key]
     (+ product (* (get l_histo key 0) (get r_histo key 0))))
   0
   (keys l_histo)))

Key points:

  • [l_histo r_histo] — these are the arguments to the dot-product function.
  • (fn defines an anonymous function.
  • (get gets the value for the specified key from the specified map, returning the final argument when the key is absent.
  • 0 is the default value for product
  • (keys returns the keys of the specified hash map
  • Clojure really wins a lot by delegating to Java. This happened for free: (Math/sqrt 100) => 10.0

So, reduce is a critical step for people coming from imperative programming languages looking to do basic things with collections.

 

Put that for loop down

I’ve been spending a lot of time with Clojure recently to prepare for the ICFP 2009 contest. It hasn’t been the easiest thing, but the difficulty has come from trying to grasp many new concepts at once. The documentation has been pretty good, and the IRC channel has been pretty worthwhile.

I’m trying to settle on a specific task to accomplish when learning a new language. Something concrete but still a little academic… implementing cosine similarity using character bigrams isn’t much code, but it covers enough bases to be useful.

Part of the process involves splitting a string into overlapping character pairs.

Bad old me instinctively thinks “for loop”.

Good new me thinks… bad old me probably has it right. But it doesn’t feel very Clojure-y. This solution is a little more Clojuriffic, probably still a long shot from idiomatic:

(defn split-bigrams [text]
  (map 
    (fn [l r] (str l r))
    (seq text)
    (rest (seq text))))

Bad old me (BOM) would have iterated over the string using a for loop with an index variable, cycling length – 1 times.

Good new me (GNM) had to take a different approach. I converted the string into two sequences; the first was just the string, but the second was the string minus the first character.

sequence A : [A B C D E F]
sequence A': [B C D E F]

The (map) function takes a callback and a number of sequences. The callback function gets the nth element from each sequence, and the output from the callback is collected and returned by (map). I don’t have to specify a length or repetition count, as (map) stops processing when any of the collections run out of data.

user=> (split-bigrams "ABCDEF")
("AB" "BC" "CD" "DE" "EF")

This hasn’t taken a lot of time, but more than I like. It has taken A LOT of patience, energy, and focus during that time. But I like finding new ways to solve problems. Now I have a new tool in my nerdbelt, one that pushes the for loop back a little bit.

 

Clojure for the ICFP 2009 contest?

So the programming contest for the 12th International Conference on Functional Programming is coming up soon — June 26th – 29th. Last year team Foognostic submitted a Ruby-based solution which made it to the second round. This year we will take another stab at it, this time with Clojure. Anyone else interested?

Last year I installed Trac to help organize things and that was fun, but this year I think BitBucket is the way to go. More features that someone else maintains — woo!

This year another Foognostic solution will be submitted, barring unexpected events that weekend. For the sake of trying to look cool I will be using Clojure, mostly as an exercise to really explore a Lisp. I’m more familiar with Emacs Lisp, but Clojure is sitting right on top of all those featuriffic Java APIs.

The countdown timer stands at a little more than two weeks. That should be enough time to come up to speed, right? … or have I just uttered famous last words?

In any event, watch this space for updates!

 

Configuring projects in org mode, or defining variables in Lisp is strange

Org mode in GNU Emacs makes it somewhat easy to publish its files as a small static website. You define the files to publish, how to convert them, where to publish them, and so on. The bad news is that you need to manually write a big data structure in Emacs Lisp. Not what a Lisp newbie really wanted to hear (no customize page?) but I decided org mode was worth the pain. Speaking of pain, here’s the data structure:

 0:  (set
 1:    (quote org-publish-project-alist)
 2:    '
 3:    (
 4:      (
 5:        "tsn"
 6:        :base-directory "~/Documents/foog/tsn"
 7:        :base-extension "org"
 8:         :publishing-directory "/dev/null"
 9:         :publishing-function org-publish-org-to-html
10:      )
11:    )
12:  )

All that work just to define a list of lists? I could just write it in Ruby as:

 # pretend this function exists in Ruby
 # def org_publish_to_html() end
org_publish_project_alist = [
    [ "tsn",
      :base_directory, "~/Documents/foog/tsn",
      :base_extension, "org",
      :publishing_directory, "~/Documents/foog/tsn_export",
      :publishing_function, org_publish_to_html ]
];

The major difference is all this set and quote nonsense. How many function calls does it take to initialize a variable in Lisp?? It turned out that answering that question well required some spelunking into Lisp. It kind of makes me appreciate GNU Emacs Lisp as a low-level high-level language.

And now, we plunge through the rabbit hole!

So, why call the (quote) function when defining a variable? I didn’t realize what a good question this was. The answer means paying close attention to references and values. Let’s take a quick look at a function which determines whether its argument is a value or a reference:

(symbolp 123)           => nil  ;; nil is false in Emacs Lisp
(symbolp (quote 123))   => nil  ;; 123 passes through...
(symbolp (quote "abc")) => nil  ;; abc passes through
(symbolp (quote foo))   => t    ;; but foo is a symbol.

Okay, so now we can distinguish between a value and a reference. The next step is assigning a value to a reference. This is where (quote) is essential.

(set foo 123) => Lisp error: (void-variable foo)

That silly Lisp interpreter, trying to get a value while we are in the middle of setting it. That’s where (quote) comes in. The function call to quote is evaluated, and the result is passed as the first argument to (set). That makes some sense, but because variable definition happens ALL THE TIME in imperative programming languages Emacs Lisp has tried to make this as easy as possible:

(set (quote foo) 123)  => 123  ;; No one does this
(set 'foo 456)         => 456  ;; 'foo = (quote foo)
(setq foo 789)         => 789  ;; People do this one.

INTERMISSION

Now that variable assignment has been hashed out we can pick up speed a bit and finish this off:

  • Line 1 specifies the symbol we want to use.
  • Line 2 is just a macro for (quote), so the contents will be returned without being evaluated.
  • Line 3 starts a list. To me this implies that the variable org-publish-project-alist will contain an array of … well, not sure yet.
  • Line 4 starts a list. This list will be the first item in the list defined by org-publish-project-alist.
  • Line 5 is just a string. It is the first element in the list. Calling (car) on the list started at line 4 would return "tsn".
  • Line 6 contains the 2nd and 3rd items in the list. :base-directory is a keyword symbol ((symbolp :base-directory) => t). "~/Documents/foog/tsn" is the next element in the list.
  • Lines 7-9 define the remnant of the list started at line 4.
  • The remaining lines are the obligatory closing parens.

Probably because I am very new to Lisp I don’t understand why flat lists are used to store this data. Key value pairs are a great fit for hash maps or associative lists (the variable name ends with alist…). My first attempt to restructure the data would be:

(setq org-projects
  '(
     ("tsn"
       (:base-directory . "~/Documents/foog/tsn")
       (:base-extension . "org")
       (:publishing-directory . "~/Documents/foog/tsn_export")
       (:publishing-function . org-publish-org-to-html))
     ("css" 
       (:base-directory . "~/Documents/foog/tsn/css")
       (:base-extension . "css"))
     ("img" 
       (:base-directory . "~/Documents/foog/tsn/images")
       (:base-extension . "jpg|png"))
  ))

The difference being that key value pairs are stored in an associative list. This lets you take advantage of built in functions for getting property values.

(mapcar
  (lambda (elt)
    (let
      ((project (car elt))
       (properties (cdr elt)))
      (print (format "Project %s has base extension %s"
                      project
                      (cdr (assoc :base-extension properties))))))
  org-projects)

Output: (”Project tsn has base extension org” “Project css has base extension css” “Project img has base extension jpg|png”)

Wow, look at all the rambling. At least now I understand how to manually configure org mode’s projects.

Lisp is neat.

 

introducing jawarepl

JAWAREPL is a JAva Web Application Read Eval Print Loop. It loads a Spring-based Java .war file into groovysh, and then makes its fully actived Spring beans easy to use.

Here’s a quick example. It uses ‘petclinic’, one of the sample apps included with Spring:

$ groovysh
. /Users/moi/Documents/code/jawarepl/jawarepl.groovy
inst = new JAWAREPL();

Here’s how you configure it:

inst.war_path = "/path/to/petclinic/dist/petclinic.war";
inst.context_paths = [ "WEB-INF/applicationContext-hibernate.xml" ];
ctx = inst.context;

After that, the sky’s the limit! Start grabbing beans out and call all the methods you want. Put this into groovysh:

clinic = ctx.getBean("clinic");
clinic.vets.each {
   println "vet = ${it.lastName}, ${it.firstName}";
   it.specialties.each {
       println "    $it";
   }
}

And this should come out:

vet = Carter, James
vet = Douglas, Linda
    dentistry
    surgery
vet = Jenkins, Sharon
vet = Leary, Helen
    radiology
vet = Ortega, Rafael
    surgery
vet = Stevens, Henry
    radiology

Just to demonstrate that it’s not only for reading data, here is another sample where it adds a visit to the petclinic.

owner = clinic.findOwners("Schroeder")[0];
owner.pets.visits.each { println "$it.date, $it.description" }
[2009-01-24, 2009-01-24], [JAWAREPL test, JAWAREPL test]
visit = new org.springframework.samples.petclinic.Visit();
visit.date = new Date();
visit.pet = pets[0];
visit.description = "JAWAREPL test2";
clinic.storeVisit(visit);
...
Hibernate: insert into visits (visit_date, description, pet_id) values (?, ?, ?)
owner = clinic.findOwners("Schroeder")[0];
owner.pets.visits.each { println "$it.date, $it.description" }
[2009-01-24, 2009-01-24, 2009-01-24], [JAWAREPL test,
JAWAREPL test2, JAWAREPL test]

More detailed instructions are available on bitbucket.

JAWAREPL has been tested on three of the sample Spring apps and a basic Grails app. Those are pretty trivial samples and even so, it was a minor task to make them all work; the Grails war had none of the GORM mojo stitched in so it was really not very useful (patches anyone?) That being said, I would not expect a complex war file to load smoothly. I seem to recall some sort of mock/mini JNDI provider in Spring if that what goes wrong. I will try to look at any bug reports, or much better yet patches.

 

Implementing xor in a bitunwise way

Bitunwise is one of my pet projects. It uses a common algorithm to reimplement memcpy, memset, memcmp, bzero, and kin. I decided to add support for xor’ing two byte streams. The first patch was small and easy to write, but it needed better abstraction. I took another look at the functions I was trying to implement and realized they can be grouped by number of input streams:

  • zero input streams: bzero
  • one input stream: memset, memcpy
  • two input streams: xor

So now instead of calling into the core algorithm manually, the emitting functions call the appropriate arity helper method. It worked out to the same number of lines of code… BUT future emitting methods (e.g. nand) will be one or two liners.

 

Yay for BitBucket

It’s 2009 and I still prefer CVS to Subversion. I earned my source control badge with CVS and learned to live with its warts. Better the devil you know, you know? I decided to learn as much svn as necessary while waiting for the next generation of version control software.

Distributed version control seems to have surged since Linus Torvalds kicked Bitkeeper to the curb in early 2005. Of course he immediately started writing a DVCS named git. His celebrity brought it instant attention, even an hour long presentation at Google HQ (I have to watch this someday).

About a year ago I downloaded version 1.5.3.8 of git and played with it. It was okay, I guess. It wasn’t easy to love, not that I mind that too much. The bin directory contains 145+ standalone apps, although I guess they are typically used in the unix-y “my stdout is your stdin” approach?

Git’s got the goods, no doubt. It capably manages a top-tier complex and distributed codebase. That is an uncommonly difficult task and satisfying it means leaving some facets less polished. Namely, learning curve and ease of use. I didn’t feel like tackling both of those while learning a new type of version control.

Mercurial competes with git in the DVCS arena. It has a nice website and some really nice quick reference cards. It didn’t and still doesn’t distract me while I’m trying to grok what’s different and possibly better about DVCS.

Part of “distributed” means having a read/write repository somewhere on the internet. There’s probably a few ways to do it; “hg serve” runs an HTTP server with a decent interface for your repository. It’s pretty nice, I’ve tried it. But the fewer servers I have to maintain the better. That’s where BitBucket comes in.

Another part of “distributed” means making it easy to actually share a repository. Pushing and pulling files is critical but viewing prior revisions, tracking bugs, having a wiki, and rss/atom feeds are big wins for daily work. BitBucket offers those and makes it easy. Really easy. Within 30-45 minutes or so I had:

  1. Signed up using my OpenID / ClaimID account.
  2. Uploaded a public key
  3. Created a repository and pushed changes over ssh
  4. Updated Google Reader with an RSS feed for my repository

All for free! Apparently they also offer features which simplify all the distribute-y goodness in Mercurial. But that’s a far cry from where I am now.