Mary Rose Cook

Git in six hundred words

(This essay is a companion piece to Gitlet, my implementation of Git in JavaScript.)

Imagine you have a directory called alpha. It contains a file called number.txt that contains the text first.

You run git init to set up alpha as a Git repository.

You run git add number.txt to add number.txt to the index. The index is a list of all the files that Git is keeping track of. It maps filenames to the content of the files. It now has the mapping number.txt -> first. Running the add command has also added a blob object containing first to the Git object database.

You run git commit -m first. This does three things. First, it creates a tree object in the objects database. This object represents the list of items in the top level of the alpha directory. This object has a pointer to the first blob object that was created when you ran git add. Second, it creates a commit object that represents the version of the repository that you have just committed. This includes a pointer to the tree object. Third, it points the master branch at the new commit object.

You run git clone . ../beta. This creates a new directory called beta. It initializes it as a Git repository. It copies the objects in the alpha objects database to the beta objects database. It points the master branch on beta at the commit object that the master branch points at on the alpha repository. It sets the index on beta to mirror the content of the first commit. It updates your files - number.txt - to mirror the index.

You move to the beta repository. You change the content of number.txt to second. You run git add number.txt and git commit -m second. The commit object that is created has a pointer to its parent, the first commit. The commit command points the master branch at the second commit.

You move back to the alpha repository. You run git remote add beta ../beta. This sets the beta repository as a remote repository.

You run git pull beta master.

Under the covers, this runs git fetch beta master. This finds the objects for the second commit and copies them from the beta repository to the alpha repository. It points alpha’s record of beta’s master at the second commit object. It updates FETCH_HEAD to show that the master branch was fetched from the beta repository.

Under the covers, the pull command runs git merge FETCH_HEAD. This reads FETCH_HEAD, which shows that the master branch on the beta repository was the most recently fetched branch. It gets the commit object that alpha’s record of beta’s master is pointing at. This is the second commit. The master branch on alpha is pointing at the first commit, which is the ancestor of the second commit. This means that, to complete the merge, the merge command can just point the master branch at the second commit. The merge command updates the index to mirror the contents of the second commit. It updates the working copy to mirror the index.

You run git branch red. This creates a branch called red that points at the second commit object.

You run git checkout red. Before the checkout, HEAD pointed at the master branch. It now points at the red branch. This makes the red branch the current branch.

You set the content of number.txt to third, run git add numbers.txt and run git commit -m third.

You run git push beta red. This finds the objects for the third commit and copies them from the alpha repository to the beta repository. It points the red branch on the beta repository at the third commit object, and that’s it.

(If you would like to learn about the internals of Git in detail, you can read my essay, Git from the inside out.)

 

Racecar

In December 2014, I did the Ludum Dare game jam. I made Racecar, a top-down 2D racing game, in forty-eight hours. I used Coquette, my micro framework for JavaScript games. A few days later, I cleaned up the Racecar code and added the game to the Coquette demos.

 

My speech to new Recursers

Hi. I’m Mary. I’m a facilitator at the Recurse Center.

First, I’m going to talk about what facilitators do. Second, I’m going to give you advice about making the most of your time at the Recurse Center.

What do facilitators do?

We’re here to help you help yourselves learn. We can do that in a number of ways.

We can review your code. That is: we can look at your code and talk about ways it might be improved. If you want code review, it’s best if you come prepared. Identify areas of your code that you’re worried about. Or pick a part of your program and a goal, like making the code faster, more idiomatic, shorter, easier to read, more functional or more object-oriented.

We can pair program with you. That is: we can collaborate with you on your code. When you pair with someone, try and be your most conscientious programmer self. Practice higher level programming skills like diving deep and debugging systematically.

We can help you choose a project that fits your learning goals for the Recurse Center. This might mean pointing you for inspiration to the well-thumbed list of Fun and Educational Recurse Center Projects. This might mean talking through what your learning goals actually are.

We run workshops on subjects that interest us. Recently, I’ve run workshops on getting started with functional programming, writing a game and exploring prototypal inheritance in JavaScript.

We can give you advice on how to get started with a new project or technology. That might mean talking about a reasonable architecture for your project. Or it might mean talking through the basics of a technology.

We can recommend other Recursers, facilitators or alumni who share your interests and who you might enjoy working with.

We can help you get unstuck when you’ve been struggling with a problem. Some of us use a guideline about when to ask for help. If you’re stuck, you must struggle for fifteen minutes. If you’ve struggled for fifteen minutes, you must ask for help. One useful way to spend that fifteen minutes is to rubber duck your problem. That is, explain your problem out loud or in your head to a literal or notional yellow rubber duck. Sometimes, this yields the answer. Sometimes, it identifies a promising avenue of investigation. And it always clarifies your understanding of the problem.

Finally, we can talk to you if you’re having problems in your personal life. Your time at the Recurse Center can be hard. It’s a new environment that is isolated from family and old friends. We can talk with you about problems you’re having.

All the things facilitators do for you can be done by you for each other: code review, pairing, running workshops, giving counsel and so on. There’s only one difference between facilitators and Recursers. Your priority is your learning. But we’re paid to be here, so our priority is your learning, too.

Advice for getting the most out of your time at the Recurse Center

You’ve probably made big sacrifices to come here. Maybe you’ve saved up all your money for months or years. Maybe you’ve quit your job. Maybe you’ve moved to a new city. Maybe you’ve decided to neglect your hobbies, or your friends, or your children.

If you make your time at the Recurse Center the most productive three months of your life, that’s fine. That’s pretty OK. But, given the sacrifices you’ve made, it would be even better to use your time here to learn skills that will compound your improvement as a programmer forever. The Recurse Center is a great place to put into muscle memory these higher level skills that get neglected under pressure from grades or bosses or worrying about what the internet will think.

There are four skills that have paid great dividends for me.

Dive deep. You’re using a framework, library or language and you realise you don’t understand something about it. Take the time to understand by reading the source, playing around in a REPL or reading the documentation. This will cement your mental model and help make you a sure-footed programmer.

Debug systematically. When you’re stuck on a bug, don’t just type in things you think might fix the problem. Form a hypothesis about what the problem is, then run experiments. If the results support the hypothesis, the problem is now clear and that is most of the battle. Otherwise, use the observations from your experiments to form a new hypothesis.

Learn your tools. Fix those nagging problems in your config. Learn more keyboard shortcuts. Automate processes you do regularly. Don’t lose days on this. Fix one thing, then get back to work.

Learn one programming language really well. This takes months or years. But it has two big advantages. First, you now have a language you can think in. You can express an algorithm or a thought fluently without looking up syntax or APIs. Second, you will learn about the deep ideas in programming. You could learn these deep ideas by learning new languages. Programming in Erlang teaches you about concurrency. Programming in C teaches you about memory management. But sticking with one language keeps you in familiar surroundings. This lets you appreciate the subtleties of these fundamental ideas. You, know: space, time, shit like that.

 

Scarface, Prince of the City and Pieter de Hooch

Brian De Palma is very good at making his scenes feel like they are part of a living world. Look at the first drug deal in Scarface. There are two shots that connect the car on the street to the rooms in the apartment where the deal is being done.

In the first shot, the camera follows Tony and Chico across the road and up the stairs to the door of the apartment.

In the second shot, the camera goes from the window of the sitting room, down into the street to the car, then back up into the apartment through the bathroom window.

But, the effect of a living world is diluted by a feeling of voyeurism. The second shot relies on the movement of the camera, rather than the movement of the objects in the scene. The camera is independent of the world.

Sidney Lumet achieves the same effect of a living world. But, his camera is mostly an observer. It is static, or it makes only unintrusive movements. It never peers. In this shot from Prince of the City, a whole block is brought alive when two running men disappear behind a building and reappear on the other side. The pan and track show us the action, but at a remove. Like the first Scarface shot, it is the movement of the characters from one place to another that brings everything to life. But, Lumet creates an even subtler effect by making the men disappear and reappear. This goes beyond connecting one visible place to another. It suggests places that are out of sight, which is more evocative of a living world.

Pieter de Hooch achieves the same effect in his painting, Paying the Hostess. Look at how light in the sky on the far left shines in through a hidden window, through a mostly obscured back room and finally falls on the wall on the far right. The light makes real the things we can’t see. It, too, is better than de Palma because it achieves its effect by making something disappear and then reappear. But, remarkably, de Hooch does it with a still image. No moving objects. No moving frame. Just moving light.

 

Testing from the ground up

Tests are pieces of code that check if your main code works. I write tests to catch bugs when I refactor. I write tests to force myself to think through and handle edge cases. I write tests to show the users of my project that my code does what I say it does.

For this essay, I will describe the code and tests for a tiny web app that draws a blue sky if it’s day time.

And a black sky if it’s night time.

I will describe all the code I wrote. The web app. The microscopic testing framework. The tests for the client side code. The mocks that fake layers of the web app and technologies that are not pertinent. The tests that use these mocks. The refactored code that simplifies the mocks by dividing the web app code into different pieces that have single responsibilites.

Along the way, I will talk about many fun things. Temporarily patching libraries written by other people. Writing code that pretends to be the internet. Making Ajax requests by hand. Writing a little web server. Examining a function to find out how many arguments it expects. Making asynchronous blocks of code run serially so they don’t tread on each other’s toes.

The code

To see the code from this essay in runnable form, go to the Testing From The Ground Up GitHub repository. At each evolution of the code, I will include a link to the corresponding commit.

The web app

This is the HTML that defines the only page in the web app. It has a canvas element that displays the sky. It loads the client side JavaScript, client.js. When the DOM is ready, it calls loadTime(), the one and only function.

<!doctype html>
<html>
  <head>
    <script src="client.js"></script>
  </head>

  <body onload="loadTime();">
    <canvas id="canvas" width="600" height="100"></canvas>
  </body>
</html>

Below is the client side JavaScript.

loadTime() starts by making an Ajax request to the server to get the current time. This is done in several steps. First, it creates an XMLHttpRequest object. Second, near the bottom of the function, it configures the object to make a GET request to "/time.json". Third, it sends the request. Fourth, when the requested time data arrives at the client, the function bound to request.onload fires.

The bound function grabs the drawing context for the canvas element. It parses "day" or "night" from the JSON returned by the server. If the value is "day", it sets the draw color to blue and draws a rectangle that takes up the whole canvas. If the value is "night", the color is black.

;(function(exports) {
  exports.loadTime = function() {
    var request = new XMLHttpRequest();
    request.onload = function(data) {
      var ctx = document.getElementById("canvas").getContext("2d");
      var time = JSON.parse(data.target.responseText).time;
      var skyColor = time === "day" ? "blue" : "black";

      ctx.fillStyle = skyColor;
      ctx.fillRect(0, 0, ctx.canvas.width, ctx.canvas.height);
    };

    request.open("GET", "/time.json", true);
    request.send();
  };
})(this);

Below is the server-side JavaScript. It is run in Node.js. Near the bottom, the code uses the Node HTTP module to create a web server. It specfies that every web request should be handled by requestHandler(). Each time this function is called, Node passes it a request object that has the URL that was requested, and a response object that can be used to send data back to the client.

If the client requested "/", the root, the index.html file is read from disk and its contents are sent back to be displayed in the user’s browser. If "/time.json" was requested, the server looks up the time, creates a piece of JSON that looks something like { "time": "day" } and sends it back to the user’s web browser.

var http = require("http");
var fs = require("fs");

var requestHandler = function (request, response) {
  if (request.url === "/") {
    fs.readFile(__dirname + "/index.html", function (err, data) {
      response.writeHead(200, { "Content-Type": "text/html" });
      response.end(data);
    });
  } else if (request.url === "/client.js") {
    fs.readFile(__dirname + "/client.js", function (err, data) {
      response.writeHead(200, { "Content-Type": "application/javascript" });
      response.end(data);
    });
  } else if (request.url === "/time.json") {
    response.writeHead(200, { "Content-Type": "application/json" });

    var hour = new Date().getHours();
    var time = hour > 6 && hour < 20 ? "day" : "night";

    response.end('{ "time": "' + time + '" }');
  }
};

http.createServer(requestHandler).listen(4000);

exports.requestHandler = requestHandler;

Here is the code for the basic web app.

A miniscule testing framework

It is possible to write tests that each include the generic code for reporting their outcome, for reporting errors, for calling the next test. But, that means a lot of repetition. It’s easier to use a testing framework. This is the one I wrote.

var test = function() {
  gatherTests(arguments).forEach(function(userTest) {
    userTest();
    process.stdout.write(".");
  });

  console.log();
};

test.isEqual = function(a, b) {
  if (a !== b) {
    throw a + " and " + b + " are not equal";
  }
};

var gatherTests = function(testArgs) {
  return Array.prototype.slice.call(testArgs).filter(function(x, i) {
    return i % 2 === 1;
  });
};

module.exports = test;

test() could be used to write a test like this:

var test = require("./test");
test(
  "should test 1 equals 1", function() {
    test.isEqual(1, 1);
  });

Which, when run, would look like this:

$ node test.js
  .

How does the testing framework run the tests it is given?

test() takes a series of arguments that alternate between string descriptions and test functions. It throws away the ones that are strings, which are merely human-readable window-dressing. It puts the functions into an array. It walks through that array, calling each function and printing a period to indicate that the test passed.

The test functions use test.isEqual() to assert that certain variables have certain values. (A real testing framework would have a more permissive and pragmatic version of isEqual() that returned true in a case like test.isEqual({ love: "you" }, { love: "you" }).) If an assertion turns out to be unfounded, an exception is thrown, an error is printed and the tests stop running.

Mocking the server and the internet

I don’t want to have to run the server when I run the tests. That would be an extra step. It would add statefulness that would need to be reset before each test. It would necessitate network communications between the test and the server.

This is the code I wrote that pretends to be the server deciding what time it is and the internet relaying that information back to the client. It does this by faking an Ajax request. It replaces the XMLHttpRequest constructor function with a function that returns a home-made Ajax request object. This object swallows the web app’s call to open(). When the web app calls send(), it calls the function that the web app has bound to onload. It passes some fake JSON that makes it seem like it is always day time.

global.XMLHttpRequest = function() {
  this.open = function() {};

  this.send = function() {
    this.onload({
      target: { responseText: '{ "time": "day" }' }
    });
  };
};

Mocking the canvas and the DOM

When the real code runs in a real web browser, it renders the sky to the real canvas in real blue. This is problematic. First, I don’t want to require a browser to test my code. Second, even if I capitulated to that requirement, it would be hard to check if the right thing happened. I would probably need to look at the individual pixels of the canvas drawing context to see if they were bluey.

Instead of walking down that horrid road, I wrote some code that pretends to be the browser DOM and the canvas element. It redefines getElementById() to return a fake canvas. This has a fake getContext() that returns a fake drawing context that has a fake fillRect() and a fake reference to a fake canvas that has a width and height. Instead of drawing, this function checks that the arguments passed to it have the expected values.

global.document = {
  getElementById: function() {
    return {
      getContext: function() {
        return {
          canvas: { width: 300, height: 150 },
          fillRect: function(x, y, w, h) {
            test.isEqual(x, 0);
            test.isEqual(y, 0);
            test.isEqual(w, 300);
            test.isEqual(h, 150);
            test.isEqual(this.fillStyle, "blue");
          }
        };
      }
    };
  }
};

This is the full test.

var test = require("./test");
var client = require("../client");

test(
  "should draw blue sky when it is daytime", function() {
    global.XMLHttpRequest = function() {
      this.open = function() {};

      this.send = function() {
        this.onload({
          target: { responseText: '{ "time": "day" }' }
        });
      };
    };

    global.document = {
      getElementById: function() {
        return {
          getContext: function() {
            return {
              canvas: { width: 300, height: 150 },
              fillRect: function(x, y, w, h) {
                test.isEqual(x, 0);
                test.isEqual(y, 0);
                test.isEqual(w, 300);
                test.isEqual(h, 150);
                test.isEqual(this.fillStyle, "blue");
              }
            };
          }
        };
      }
    };

    client.loadTime();
  });

Don’t worry. That code is as bad as this essay gets.

Here is the code that includes the testing framework and the first client side test.

The drawing code refactored into its own module

Those mocks are pretty horrid. They make the test very long, which discourages me from writing tests to check for other ways the code might go wrong. The mocks are horrid because the client side code is one big function that communicates with the server, asks what time it is, parses the response and draws in the canvas.

To solve this problem, I refactored the code by pulling the drawing code out into its own module, renderer. This module includes fillBackground(), a function that fills the canvas with the passed color. As a side benefit, the main web app code is now easier to understand and change.

;(function(exports) {
  exports.loadTime = function() {
    var request = new XMLHttpRequest();
    request.onload = function(data) {
      var time = JSON.parse(data.target.responseText).time;
      var color = time === "day" ? "blue" : "black";
      renderer.fillBackground(color);
    };

    request.open("GET", "/time.json", true);
    request.send();
  };

  var renderer = exports.renderer = {
    ctx: function() {
      return document.getElementById("canvas").getContext("2d");
    },

    fillBackground: function(color) {
      var ctx = this.ctx();
      ctx.fillStyle = color;
      ctx.fillRect(0, 0, ctx.canvas.width, ctx.canvas.height);
    }
  };
})(this);

This lets me replace the complex document mock with a short renderer.ctx() mock. The test becomes shorter, simpler and less brittle.

var test = require("./test");
var client = require("../client");

test(
  "should draw sun and blue sky in canvas when it is daytime", function() {
    global.XMLHttpRequest = function() {
      this.open = function() {};

      this.send = function() {
        this.onload({
          target: { responseText: '{ "time": "day" }' }
        });
      };
    };

    client.renderer.ctx = function() {
      return {
        canvas: { width: 300, height: 150 },
        fillRect: function(x, y, w, h) {
          test.isEqual(x, 0);
          test.isEqual(y, 0);
          test.isEqual(w, 300);
          test.isEqual(h, 150);
          test.isEqual(this.fillStyle, "blue");
        }
      }
    };

    client.loadTime();
  });

Here is the code for the modularised renderer and resulting simplified client side test.

I modularised the client side code further by splitting out three more functions.

This is get(). It makes an Ajax request to the passed URL. The Ajax request object calls the passed callback with the response.

var get = exports.get = function(url, callback) {
  var request = new XMLHttpRequest();
  request.onload = callback;
  request.open("GET", url, true);
  request.send();
};

This is getTime(). It uses get() to make an Ajax request to "/time.json" and parses "day" or "night" from the response.

exports.getTime = function(callback) {
  exports.get("/time.json", function(data) {
    callback(JSON.parse(data.target.responseText).time);
  });
};

This is displayTime(). It takes a string with the value "day" or "night" and draws either a blue sky or a black sky.

exports.displayTime = function(time) {
  var color = time === "day" ? "blue" : "black";
  renderer.fillBackground(color);
};

I changed the body tag in the HTML page. It now calls getTime(), passing displayTime() as the callback.

<body onload="getTime(displayTime);">
  <canvas id="canvas" width="600" height="100"></canvas>
</body>

Having more modular code means that I can mock parts of the web app API, rather than mocking code written by third parties. This makes it easier to write tests that test a specific piece of functionality.

Using this refactor, I could write tests that are more extensive. The first test checks that getTime() correctly parses JSON sent by the server. The second test checks that a call to fillBackground() draws a rectangle at the correct position and size. The third test checks that displayTime() draws a rectangle of the correct color for the time of day.

var test = require("./test");
var client = require("../client");

test(
  "should parse time from server", function() {
    client.get = function(_, callback) {
      callback({ target: { responseText: '{ "time": "day" }' }});
    };

    client.getTime(function(time) { test.isEqual(time, "day"); });
  },

  "should draw rect of passed size and color when fillBackground() called", function() {
    client.renderer.ctx = function() {
      return {
        canvas: { width: 300, height: 150 },
        fillRect: function(x, y, w, h) {
          test.isEqual(x, 0);
          test.isEqual(y, 0);
          test.isEqual(w, 300);
          test.isEqual(h, 150);
        }
      }
    };

    client.renderer.fillBackground("blue");
  },

  "should draw blue sky when it is daytime", function() {
    client.renderer.fillBackground = function(color) {
      test.isEqual(color, "blue");
    };

    client.displayTime("day");
  });

Here is the code for the more modular client code and more extensive tests.

Testing the server side code

Asynchronous tests

Some of the responsibilities of a web server require asynchronous operations. If a browser sends a request to the web app for the URL "/", it gets back the contents of the index.html file. To make this happen, the file needs to be read from disk asynchronously and the contents sent to the client once they have been read. Similarly, if the browser requests the URL "/client.js", it gets back the contents of the client.js file.

These are the tests I wrote to check that both cases are handled correctly.

var test = require("./test");
var requestHandler = require("../server").requestHandler;
var http = require("http");

test(
  "should send index.html when / requested", function() {
    http.ServerResponse.prototype.end = function(data) {
      test.isEqual(data.toString().substr(0, 9), "<!doctype");
    };

    requestHandler({ url: "/" }, new http.ServerResponse({}));
  },

  "should send client.js when /client.js requested", function() {
    http.ServerResponse.prototype.end = function(data) {
      test.isEqual(data.toString().substr(0, 10), ";(function");
    };

    requestHandler({ url: "/client.js" }, new http.ServerResponse({}));
  });

The output when I run these tests:

$ node server_tests.js
  ..

  test.js:12
    throw a + " and " + b + " are not equal";
                          ^
  <!doctype  and ;(function are not equal

An exception is thrown, yet there are two periods indicating two passed tests. Something is wrong.

In the first test, response.end() is mocked with a function that checks that
"<!doctype" is sent to the user. Next, requestHandler() is called, requesting the URL "/". requestHandler() starts reading index.html from disk. While the file is being read, the test framework presses on with its work. Uh oh. That is, the framework prints a period and starts the second test, though the response.end() mock has not asserted the value of the response. response.end() is re-mocked with a function that checks that ";(function" is sent to the user. Double uh oh. requestHandler() is called by the second test. It requests the URL "/client.js". requestHandler() starts reading client.js from disk. The framework prints another premature period.

At some point later, index.html is read from disk. This means that the callback in requestHandler() is called and it calls response.end() with the contents of index.html. Unfortunately, by this time, response.end() has been mocked with a function expecting ";(function". The assertion fails.

This problem can be solved by running tests serially. A test is run. It signals it has finished. The next test is run.

This may seem pedantic. Shouldn’t tests that are testing asynchronous behaviour be able to cope with with the dangers of asynchrony? Well, yes and no. They should certainly test the asynchronous behaviours of requestHandler(). But they should not have to cope with other tests messing about with their execution environment part way through their execution.

(It would be possible to go further and make the tests completely functionally pure. This could be done in a fundamentalist way: the test framework resets the execution context before each test. Or it could be done in a pragmatic way: each test undoes the changes it made to the execution environment. Both ways are outside the scope of this essay.)

I rewrote the testing framework to run asynchronous tests serially. Each asynchronous test binds a done() callback parameter. It calls this when it has made all its assertions. The testing framework uses the execution of this callback as a signal to run the next test. Here are the rewritten tests.

var test = require("./test");
var requestHandler = require("../server").requestHandler;
var http = require("http");

test(
  "should send index.html when / requested", function(done) {
    http.ServerResponse.prototype.end = function(data) {
      test.isEqual(data.toString().substr(0, 9), "<!doctype");
      done();
    };

    requestHandler({ url: "/" }, new http.ServerResponse({}));
  },

  "should send client.js when /client.js requested", function(done) {
    http.ServerResponse.prototype.end = function(data) {
      test.isEqual(data.toString().substr(0, 10), ";(function");
      done();
    };

    requestHandler({ url: "/client.js" }, new http.ServerResponse({}));
  });

A miniscule asynchronous testing framework

Below is the code for the asynchronous testing framework.

Look at runTests(). It takes userTests, an array that contains the test functions to be run. If that array is empty, the tests are complete and the program exits. If it is not empty, it looks at the length attribute of the next test function. If the attribute has the value 1, the test expects one argument: a done() callback. It runs testAsync(), passing the test function and a callback that prints a period and recurses on runTests() with the remaining test functions.

testAsync() creates a timeout that will fire in one second. It runs the test function, passing a done() callback for the test to run when it is complete. If the callback gets run, the timeout is cleared and testDone() is called to indicate that the next test can run. If the done() callback is never run by the test function, something went wrong. The timeout will fire and throw an exception, and the program will exit.

If the length attribute of the test function has the value 0, the function is run with testSync(). This is the same as testAsync(), except there is no timeout and the testDone() callback is called as soon as the test function has completed.

var test = function() {
  runTests(gatherTests(arguments));
};

var runTests = function(userTests) {
  if (userTests.length === 0) {
    console.log();
    process.exit();
  } else {
    var testDone = function() {
      process.stdout.write(".");
      runTests(userTests.slice(1));
    };

    if (userTests[0].length === 1) {
      testAsync(userTests[0], testDone);
    } else {
      testSync(userTests[0], testDone);
    }
  }
};

var testSync = function(userTest, testDone) {
  userTest();
  testDone();
};

var testAsync = function(userTest, testDone) {
  var timeout = setTimeout(function() {
    throw "Failed: done() never called in async test.";
  }, 1000);

  userTest(function() {
    clearTimeout(timeout);
    testDone();
  });
};

test.isEqual = function(a, b) {
  if (a !== b) {
    throw a + " and " + b + " are not equal";
  }
};

var gatherTests = function(testArgs) {
  return Array.prototype.slice.call(testArgs).filter(function(x, i) {
    return i % 2 === 1;
  });
};

module.exports = test;

Here is the code for the asynchronous testing framework and the new server tests.

An exercise

Now that it is possible to write asynchronous tests, I can write the tests for the server. Or, rather: you can.

If you are not sure where to start, try refactoring the server so the code is more modular. Write a function that sends the passed string with the passed Content-Type to the client. Write a function that reads a static file from disk and responds with the contents. Write a function that converts the current date into "day" or "night". Write a function that takes a request for a certain URL and sends the right response.

Once your refactor is complete, or maybe while it is in progress, you can write your tests.

Summary

I wrote a simple web app. I wrote tests for it and discovered that I could mock the pieces I didn’t want to run when I ran the tests. I discovered that scary things like Ajax and the canvas API really aren’t so scary when I wrote skeletal versions of them. I realised that the mocks I had written were quite verbose. I refactored the web app code to make it more modular. This made the code better and easier to change in the future. It meant I could mock the interfaces I had written, rather than those invented by other people. This meant the mocks became simpler or unnecessary. This made it easier to write tests, so I could write more of them and test the web app more extensively.

I wrote two tests for the server. I discovered that the test framework ran them in parallel, which meant they interfered with each other. I rewrote the test framework to run tests serially. I modified the tests to signal when they were finished. I handed over the rest of the job to you.

 

New York, 2014

A video I made about Hacker School.


 

Screencast about writing a Lisp interpreter in JavaScript

This is a recapitulation of the talk I gave at BrooklynJS in February 2014. I covered the same material at a more measured pace in this article. You can read the code on GitHub.

 

Moonface, live in Brooklyn

In May, I went to the barren warehouse district in Carroll Gardens in Brooklyn to see Moonface, Spencer Krug’s latest music project. I walked there from my friend Dave’s house, along wide pavements that contained nothing - no people, no benches, no parked cars, no bins, no street lights - along streets that were bordered by buildings with walls painted solid, chalky red, over a little bridge where, half way across, I could see down the canal in one of those weird long views that I used to see when I walked over the bridges in Berlin as the sun came up, one of those views you only get when you are half way across one of the numbered avenues in Manhattan, view clear for streets and streets, the sky not blocked but framed by buildings.

I arrived at the venue. Spencer Krug came on. It was just him and a piano. He sat down on the stool with a glass of what, based on the colour and volume, looked like apple juice, but was more likely to be a really very huge drink of whiskey.

He said, “I’m going to play some new songs tonight.” Most of what he played appears on his new album, “Julia With Blue Jeans On”.

The first song was about a drive he and a friend took across the US. It was about how they missed their exit because they were so engrossed in their conversation about their ex-girlfriends.

The second song had three sections. They corresponded to “Barbarian”, “Everyone is Noah, Everyone is the Ark” and “Barbarian II” on his new record. The first section was based around a persistent seesaw interval on the beautiful low notes of the piano. Spencer Krug sang, “I asked you where you want to be buried, and you asked me the name of the town where I was born.” He sang, “I am a barbarian, sometimes”, and tears came to my eyes.

The second section was, notionally, about Noah and the Ark. This section, and its relationship to the rest of the song, was an excellent summary of Spencer Krug’s entire musical catalogue. A stream of new melodies that beget more new melodies that beget more new melodies. Extended metaphors that, like Bob Dylan (“Yes, to dance beneath the diamond sky with one hand waving free, silhouetted by the sea, circled by the circus sands, with all memory and fate driven deep beneath the waves, let me forget about today until tomorrow”) combine the fantastical and the corporeal, and use the everyday details as a way to let the listener into the abstract flights of fancy: “If I am an animal I am one of the few that is self destructive. I have chewed through my beautiful muscle, I have chewed through my beautiful narrative to get out of Canada and into your door.”

But there are new elements, too. The third section of the song was a reprise. The seesaw interval and the barbarian and the tender burial wishes all came back.

The third song followed the same structure as the second: the first section introducing the themes, the second section going off into the wild, before leading into the third section’s reprise. Krug really let go at this point, bellowing, “Someone’s been writing your name all over the walls”, and singing “Love me for the way I used to be. Or, love me for the way that the skin tore open.” This song, “Love the House You’re In”, is on the album, but, strangely, this third part of the song is not.

He said, “Last night I played at, uh, Hamilton University about four and a half hours north of here in upstate New York. And there was literally a toga party going on in the same room. At one point, a woman went by in a shopping cart pushed by her friends. So, this is nicer.”

He began the fourth song by saying, “So, a lot of these are cheesy love songs and this is another one.” He ended it with a sheepish shrug of his shoulders.

He said, “It’s getting hot, better strip down”, and began taking off his shirt. Some people whooped and he said, “There’s a t-shirt under here, Ladies.” He paused. “This isn’t Hamilton University.”

The fifth song showed another new element of Spencer Krug’s music. He seems to have become a really excellent pianist. There are syncopations, trills, off-rhythm rolls that build to far more complex constructions than the medieval marching music of his keyboard playing for Sunset Rubdown and Wolf Parade. The instrumentals have elongated and become as important and complex as the parts with singing.

The seventh song was really very charming. “I’d say the only word worth singing is a name. I’d say the only name worth singing is not God, it’s you. Julia. As beautiful and simple as the sun. Julia with blue jeans on. I see you there, at the bottom of the stairs.”

The eighth and ninth songs, the last two he played, are, most regrettably, not on his album.

Before beginning the eighth song, Krug said, “This is about moving to Helsinki. It’s about a few other things too.” It includes the lines, “I didn’t know that my old life would become a swarm of swallows thinning out against the Aurora Borealis. That’s right, I said, ‘A swarm of swallows thinning out against the Aurora Borealis’.” Nine months ago, I moved to New York from England. The swarm of swallows thinning out describes perfectly the feeling of the past slipping away to leave only a future.

The ninth song was a rolling mass. For ten minutes, it got faster and more reckless and more glorious. It went, “There is a an ageing arm across a naked waist. There is a fallen tree against a perfect January snow. And that’s as spiritual as I need to be.” And, “Breakwater to the sea, just strong enough to hold back the waves as they roll in. And I should have told you sooner that I’ll be relying on you, baby. And I should have told you sooner I think you’re just like me. And I guess I should have told you I am sometimes just an arm hanging out a window on the highway in the sun. Or a fallen tree against a January snowbank. Or a break water to the sea.”

What this doesn’t convey, what lyrics never convey, is the way all this felt as the music got faster and faster and faster and faster, taking the thematic repetitions and rolling them up into something so rich it was almost unbearable.

I recorded the set on the shitty speaker on my phone. I have listened to the tinny mp3s countless times over the last six months. I have not been able to get the gig out of my head.

 

A practical introduction to functional programming

Many functional programming articles teach abstract functional techniques. That is, composition, pipelining, higher order functions. This one is different. It shows examples of imperative, unfunctional code that people write every day and translates these examples to a functional style.

The first section of the article takes short, data transforming loops and translates them into functional maps and reduces. The second section takes longer loops, breaks them up into units and makes each unit functional. The third section takes a loop that is a long series of successive data transformations and decomposes it into a functional pipeline.

The examples are in Python, because many people find Python easy to read. A number of the examples eschew pythonicity in order to demonstrate functional techniques common to many languages: map, reduce, pipeline. All of the examples are in Python 2.

A guide rope

When people talk about functional programming, they mention a dizzying number of “functional” characteristics. They mention immutable data1, first class functions2 and tail call optimisation3. These are language features that aid functional programming. They mention mapping, reducing, pipelining, recursing, currying4 and the use of higher order functions. These are programming techniques used to write functional code. They mention parallelization5, lazy evaluation6 and determinism7. These are advantageous properties of functional programs.

Ignore all that. Functional code is characterised by one thing: the absence of side effects. It doesn’t rely on data outside the current function, and it doesn’t change data that exists outside the current function. Every other “functional” thing can be derived from this property. Use it as a guide rope as you learn.

This is an unfunctional function:

a = 0
def increment():
    global a
    a += 1

This is a functional function:

def increment(a):
    return a + 1

Don’t iterate over lists. Use map and reduce.

Map

Map takes a function and a collection of items. It makes a new, empty collection, runs the function on each item in the original collection and inserts each return value into the new collection. It returns the new collection.

This is a simple map that takes a list of names and returns a list of the lengths of those names:

name_lengths = map(len, ["Mary", "Isla", "Sam"])

print name_lengths
# => [4, 4, 3]

This is a map that squares every number in the passed collection:

squares = map(lambda x: x * x, [0, 1, 2, 3, 4])

print squares
# => [0, 1, 4, 9, 16]

This map doesn’t take a named function. It takes an anonymous, inlined function defined with lambda. The parameters of the lambda are defined to the left of the colon. The function body is defined to the right of the colon. The result of running the function body is (implicitly) returned.

The unfunctional code below takes a list of real names and replaces them with randomly assigned code names.

import random

names = ['Mary', 'Isla', 'Sam']
code_names = ['Mr. Pink', 'Mr. Orange', 'Mr. Blonde']

for i in range(len(names)):
    names[i] = random.choice(code_names)

print names
# => ['Mr. Blonde', 'Mr. Blonde', 'Mr. Blonde']

(As you can see, this algorithm can potentially assign the same secret code name to multiple secret agents. Hopefully, this won’t be a source of confusion during the secret mission.)

This can be rewritten as a map:

import random

names = ['Mary', 'Isla', 'Sam']

secret_names = map(lambda x: random.choice(['Mr. Pink',
                                            'Mr. Orange',
                                            'Mr. Blonde']),
                   names)

Exercise 1. Try rewriting the code below as a map. It takes a list of real names and replaces them with code names produced using a more robust strategy.

names = ['Mary', 'Isla', 'Sam']

for i in range(len(names)):
    names[i] = hash(names[i])

print names
# => [6306819796133686941, 8135353348168144921, -1228887169324443034]

(Hopefully, the secret agents will have good memories and won’t forget each other’s secret code names during the secret mission.)

My solution:

names = ['Mary', 'Isla', 'Sam']

secret_names = map(hash, names)

Reduce

Reduce takes a function and a collection of items. It returns a value that is created by combining the items.

This is a simple reduce. It returns the sum of all the items in the collection.

sum = reduce(lambda a, x: a + x, [0, 1, 2, 3, 4])

print sum
# => 10

x is the current item being iterated over. a is the accumulator. It is the value returned by the execution of the lambda on the previous item. reduce() walks through the items. For each one, it runs the lambda on the current a and x and returns the result as the a of the next iteration.

What is a in the first iteration? There is no previous iteration result for it to pass along. reduce() uses the first item in the collection for a in the first iteration and starts iterating at the second item. That is, the first x is the second item.

This code counts how often the word 'Sam' appears in a list of strings:

sentences = ['Mary read a story to Sam and Isla.',
             'Isla cuddled Sam.',
             'Sam chortled.']

sam_count = 0
for sentence in sentences:
    sam_count += sentence.count('Sam')

print sam_count
# => 3

This is the same code written as a reduce:

sentences = ['Mary read a story to Sam and Isla.',
             'Isla cuddled Sam.',
             'Sam chortled.']

sam_count = reduce(lambda a, x: a + x.count('Sam'),
                   sentences,
                   0)

How does this code come up with its initial a? The starting point for the number of incidences of 'Sam' cannot be 'Mary read a story to Sam and Isla.' The initial accumulator is specified with the third argument to reduce(). This allows the use of a value of a different type from the items in the collection.

Why are map and reduce better?

First, they are often one-liners.

Second, the important parts of the iteration - the collection, the operation and the return value - are always in the same places in every map and reduce.

Third, the code in a loop may affect variables defined before it or code that runs after it. By convention, maps and reduces are functional.

Fourth, map and reduce are elemental operations. Every time a person reads a for loop, they have to work through the logic line by line. There are few structural regularities they can use to create a scaffolding on which to hang their understanding of the code. In contrast, map and reduce are at once building blocks that can be combined into complex algorithms, and elements that the code reader can instantly understand and abstract in their mind. “Ah, this code is transforming each item in this collection. It’s throwing some of the transformations away. It’s combining the remainder into a single output.”

Fifth, map and reduce have many friends that provide useful, tweaked versions of their basic behaviour. For example: filter, all, any and find.

Exercise 2. Try rewriting the code below using map, reduce and filter. Filter takes a function and a collection. It returns a collection of every item for which the function returned True.

people = [{'name': 'Mary', 'height': 160},
          {'name': 'Isla', 'height': 80},
          {'name': 'Sam'}]

height_total = 0
height_count = 0
for person in people:
    if 'height' in person:
        height_total += person['height']
        height_count += 1

if height_count > 0:
    average_height = height_total / height_count

    print average_height
    # => 120

If this seems tricky, try not thinking about the operations on the data. Think of the states the data will go through, from the list of people dictionaries to the average height. Don’t try and bundle multiple transformations together. Put each on a separate line and assign the result to a descriptively-named variable. Once the code works, condense it.

My solution:

people = [{'name': 'Mary', 'height': 160},
          {'name': 'Isla', 'height': 80},
          {'name': 'Sam'}]

heights = map(lambda x: x['height'],
              filter(lambda x: 'height' in x, people))

if len(heights) > 0:
    from operator import add
    average_height = reduce(add, heights) / len(heights)

Write declaratively, not imperatively

The program below runs a race between three cars. At each time step, each car may move forwards or it may stall. At each time step, the program prints out the paths of the cars so far. After five time steps, the race is over.

This is some sample output:

-
--
--

--
--
---

---
--
---

----
---
----

----
----
-----

This is the program:

from random import random

time = 5
car_positions = [1, 1, 1]

while time:
    # decrease time
    time -= 1

    print ''
    for i in range(len(car_positions)):
        # move car
        if random() > 0.3:
            car_positions[i] += 1

        # draw car
        print '-' * car_positions[i]

The code is written imperatively. A functional version would be declarative. It would describe what to do, rather than how to do it.

Use functions

A program can be made more declarative by bundling pieces of the code into functions.

from random import random

def move_cars():
    for i, _ in enumerate(car_positions):
        if random() > 0.3:
            car_positions[i] += 1

def draw_car(car_position):
    print '-' * car_position

def run_step_of_race():
    global time
    time -= 1
    move_cars()

def draw():
    print ''
    for car_position in car_positions:
        draw_car(car_position)

time = 5
car_positions = [1, 1, 1]

while time:
    run_step_of_race()
    draw()

To understand this program, the reader just reads the main loop. “If there is time left, run a step of the race and draw. Check the time again.” If the reader wants to understand more about what it means to run a step of the race, or draw, they can read the code in those functions.

There are no comments any more. The code describes itself.

Splitting code into functions is a great, low brain power way to make code more readable.

This technique uses functions, but it uses them as sub-routines. They parcel up code. The code is not functional in the sense of the guide rope. The functions in the code use state that was not passed as arguments. They affect the code around them by changing external variables, rather than by returning values. To check what a function really does, the reader must read each line carefully. If they find an external variable, they must find its origin. They must see what other functions change that variable.

Remove state

This is a functional version of the car race code:

from random import random

def move_cars(car_positions):
    return map(lambda x: x + 1 if random() > 0.3 else x,
               car_positions)

def output_car(car_position):
    return '-' * car_position

def run_step_of_race(state):
    return {'time': state['time'] - 1,
            'car_positions': move_cars(state['car_positions'])}

def draw(state):
    print ''
    print '\n'.join(map(output_car, state['car_positions']))

def race(state):
    draw(state)
    if state['time']:
        race(run_step_of_race(state))

race({'time': 5,
      'car_positions': [1, 1, 1]})

The code is still split into functions, but the functions are functional. There are three signs of this. First, there are no longer any shared variables. time and car_positions get passed straight into race(). Second, functions take parameters. Third, no variables are instantiated inside functions. All data changes are done with return values. race() recurses3 with the result of run_step_of_race(). Each time a step generates a new state, it is passed immediately into the next step.

Now, here are two functions, zero() and one():

def zero(s):
    if s[0] == "0":
        return s[1:]

def one(s):
    if s[0] == "1":
        return s[1:]

zero() takes a string, s. If the first character is '0', it returns the rest of the string. If it is not, it returns None, the default return value of Python functions. one() does the same, but for a first character of '1'.

Imagine a function called rule_sequence(). It takes a string and a list of rule functions of the form of zero() and one(). It calls the first rule on the string. Unless None is returned, it takes the return value and calls the second rule on it. Unless None is returned, it takes the return value and calls the third rule on it. And so forth. If any rule returns None, rule_sequence() stops and returns None. Otherwise, it returns the return value of the final rule.

This is some sample input and output:

print rule_sequence('0101', [zero, one, zero])
# => 1

print rule_sequence('0101', [zero, zero])
# => None

This is the imperative version of rule_sequence():

def rule_sequence(s, rules):
    for rule in rules:
        s = rule(s)
        if s == None:
            break

    return s

Exercise 3. The code above uses a loop to do its work. Make it more declarative by rewriting it as a recursion.

My solution:

def rule_sequence(s, rules):
    if s == None or not rules:
        return s
    else:
        return rule_sequence(rules[0](s), rules[1:])

Use pipelines

In the previous section, some imperative loops were rewritten as recursions that called out to auxiliary functions. In this section, a different type of imperative loop will be rewritten using a technique called a pipeline.

The loop below performs transformations on dictionaries that hold the name, incorrect country of origin and active status of some bands.

bands = [{'name': 'sunset rubdown', 'country': 'UK', 'active': False},
         {'name': 'women', 'country': 'Germany', 'active': False},
         {'name': 'a silver mt. zion', 'country': 'Spain', 'active': True}]

def format_bands(bands):
    for band in bands:
        band['country'] = 'Canada'
        band['name'] = band['name'].replace('.', '')
        band['name'] = band['name'].title()

format_bands(bands)

print bands
# => [{'name': 'Sunset Rubdown', 'active': False, 'country': 'Canada'},
#     {'name': 'Women', 'active': False, 'country': 'Canada' },
#     {'name': 'A Silver Mt Zion', 'active': True, 'country': 'Canada'}]

Worries are stirred by the name of the function. “format” is very vague. Upon closer inspection of the code, these worries begin to claw. Three things happen in the same loop. The 'country' key gets set to 'Canada'. Punctuation is removed from the band name. The band name gets capitalized. It is hard to tell what the code is intended to do and hard to tell if it does what it appears to do. The code is hard to reuse, hard to test and hard to parallelize.

Compare it with this:

print pipeline_each(bands, [set_canada_as_country,
                            strip_punctuation_from_name,
                            capitalize_names])

This code is easy to understand. It gives the impression that the auxiliary functions are functional because they seem to be chained together. The output from the previous one comprises the input to the next. If they are functional, they are easy to verify. They are also easy to reuse, easy to test and easy to parallelize.

The job of pipeline_each() is to pass the bands, one at a time, to a transformation function, like set_canada_as_country(). After the function has been applied to all the bands, pipeline_each() bundles up the transformed bands. Then, it passes each one to the next function.

Let’s look at the transformation functions.

def assoc(_d, key, value):
    from copy import deepcopy
    d = deepcopy(_d)
    d[key] = value
    return d

def set_canada_as_country(band):
    return assoc(band, 'country', "Canada")

def strip_punctuation_from_name(band):
    return assoc(band, 'name', band['name'].replace('.', ''))

def capitalize_names(band):
    return assoc(band, 'name', band['name'].title())

Each one associates a key on a band with a new value. There is no easy way to do this without mutating the original band. assoc() solves this problem by using deepcopy() to produce a copy of the passed dictionary. Each transformation function makes its modification to the copy and returns that copy.

Everything seems fine. Band dictionary originals are protected from mutation when a key is associated with a new value. But there are two other potential mutations in the code above. In strip_punctuation_from_name(), the unpunctuated name is generated by calling replace() on the original name. In capitalize_names(), the capitalized name is generated by calling title() on the original name. If replace() and title() are not functional, strip_punctuation_from_name() and capitalize_names() are not functional.

Fortunately, replace() and title() do not mutate the strings they operate on. This is because strings are immutable in Python. When, for example, replace() operates on a band name string, the original band name is copied and replace() is called on the copy. Phew.

This contrast between the mutability of strings and dictionaries in Python illustrates the appeal of languages like Clojure. The programmer need never think about whether they are mutating data. They aren’t.

Exercise 4. Try and write the pipeline_each function. Think about the order of operations. The bands in the array are passed, one band at a time, to the first transformation function. The bands in the resulting array are passed, one band at a time, to the second transformation function. And so forth.

My solution:

def pipeline_each(data, fns):
    return reduce(lambda a, x: map(x, a),
                  fns,
                  data)

All three transformation functions boil down to making a change to a particular field on the passed band. call() can be used to abstract that. It takes a function to apply and the key of the value to apply it to.

set_canada_as_country = call(lambda x: 'Canada', 'country')
strip_punctuation_from_name = call(lambda x: x.replace('.', ''), 'name')
capitalize_names = call(str.title, 'name')

print pipeline_each(bands, [set_canada_as_country,
                            strip_punctuation_from_name,
                            capitalize_names])

Or, if we are willing to sacrifice readability for conciseness, just:

print pipeline_each(bands, [call(lambda x: 'Canada', 'country'),
                            call(lambda x: x.replace('.', ''), 'name'),
                            call(str.title, 'name')])

The code for call():

def assoc(_d, key, value):
    from copy import deepcopy
    d = deepcopy(_d)
    d[key] = value
    return d

def call(fn, key):
    def apply_fn(record):
        return assoc(record, key, fn(record.get(key)))
    return apply_fn

There is a lot going on here. Let’s take it piece by piece.

One. call() is a higher order function. A higher order function takes a function as an argument, or returns a function. Or, like call(), it does both.

Two. apply_fn() looks very similar to the three transformation functions. It takes a record (a band). It looks up the value at record[key]. It calls fn on that value. It assigns the result back to a copy of the record. It returns the copy.

Three. call() does not do any actual work. apply_fn(), when called, will do the work. In the example of using pipeline_each() above, one instance of apply_fn() will set 'country' to 'Canada' on a passed band. Another instance will capitalize the name of a passed band.

Four. When an apply_fn() instance is run, fn and key will not be in scope. They are neither arguments to apply_fn(), nor locals inside it. But they will still be accessible. When a function is defined, it saves references to the variables it closes over: those that were defined in a scope outside the function and that are used inside the function. When the function is run and its code references a variable, Python looks up the variable in the locals and in the arguments. If it doesn’t find it there, it looks in the saved references to closed over variables. This is where it will find fn and key.

Five. There is no mention of bands in the call() code. That is because call() could be used to generate pipeline functions for any program, regardless of topic. Functional programming is partly about building up a library of generic, reusable, composable functions.

Good job. Closures, higher order functions and variable scope all covered in the space of a few paragraphs. Have a nice glass of lemonade.

There is one more piece of band processing to do. That is to remove everything but the name and country. extract_name_and_country() can pull that information out:

def extract_name_and_country(band):
    plucked_band = {}
    plucked_band['name'] = band['name']
    plucked_band['country'] = band['country']
    return plucked_band

print pipeline_each(bands, [call(lambda x: 'Canada', 'country'),
                            call(lambda x: x.replace('.', ''), 'name'),
                            call(str.title, 'name'),
                            extract_name_and_country])

# => [{'name': 'Sunset Rubdown', 'country': 'Canada'},
#     {'name': 'Women', 'country': 'Canada'},
#     {'name': 'A Silver Mt Zion', 'country': 'Canada'}]

extract_name_and_country() could have been written as a generic function called pluck(). pluck() would be used like this:

print pipeline_each(bands, [call(lambda x: 'Canada', 'country'),
                            call(lambda x: x.replace('.', ''), 'name'),
                            call(str.title, 'name'),
                            pluck(['name', 'country'])])

Exercise 5. pluck() takes a list of keys to extract from each record. Try and write it. It will need to be a higher order function.

My solution:

def pluck(keys):
    def pluck_fn(record):
        return reduce(lambda a, x: assoc(a, x, record[x]),
                      keys,
                      {})
    return pluck_fn

What now?

Functional code co-exists very well with code written in other styles. The transformations in this article can be applied to any code base in any language. Try applying them to your own code.

Think of Mary, Isla and Sam. Turn iterations of lists into maps and reduces.

Think of the race. Break code into functions. Make those functions functional. Turn a loop that repeats a process into a recursion.

Think of the bands. Turn a sequence of operations into a pipeline.

1 An immutable piece of data is one that cannot be changed. Some languages, like Clojure, make all values immutable by default. Any “mutating” operations copy the value, change it and pass back the changed copy. This eliminates bugs that arise from a programmer’s incomplete model of the possible states their program may enter.

2 Languages that support first class functions allow functions to be treated like any other value. This means they can be created, passed to functions, returned from functions and stored inside data structures.

3 Tail call optimisation is a programming language feature. Each time a function recurses, a new stack frame is created. A stack frame is used to store the arguments and local values for the current function invocation. If a function recurses a large number of times, it is possible for the interpreter or compiler to run out of memory. Languages with tail call optimisation reuse the same stack frame for their entire sequence of recursive calls. Languages like Python that do not have tail call optimisation generally limit the number of times a function may recurse to some number in the thousands. In the case of the race() function, there are only five time steps, so it is safe.

4 Currying means decomposing a function that takes multiple arguments into a function that takes the first argument and returns a function that takes the next argument, and so forth for all the arguments.

5 Parallelization means running the same code concurrently without synchronization. These concurrent processes are often run on multiple processors.

6 Lazy evaluation is a compiler technique that avoids running code until the result is needed.

7 A process is deterministic if repetitions yield the same result every time.

 

Little Lisp interpreter

Little Lisp is an interpreter that supports function invocation, lambdas, lets, ifs, numbers, strings, a few library functions, and lists. I wrote it for a lightning talk at Hacker School to show how easy it is to write an interpreter. The code is 116 lines of JavaScript. I will explain how it works.

First, let’s learn some Lisp.

An atom, the simplest Lisp form:

1

Another atom. This time, a string:

"a"

An empty list:

()

A list containing an atom:

(1)

A list containing two atoms:

(1 2)

A list containing an atom and another list:

(1 (2))

A function invocation. This comprises a list where the first element is the function and the rest of the elements are the arguments. first takes one argument, (1 2), and returns 1.

(first (1 2))

 => 1

A lambda. That is: a function definition. The lambda takes a parameter, x, and just returns it.

(lambda (x)
 x)

A lambda invocation. This comprises a list where the first element is a lambda and the rest of the elements are the arguments. The lambda takes one argument, "Lisp", and returns it.

((lambda (x)
  x)
 "Lisp")

 => "Lisp"

Writing a Lisp interpreter is really easy.

The code for Little Lisp has two parts: the parser and the interpreter.

The parser.

Parsing has two phases: tokenizing and parenthesizing.

tokenize() takes a string of Lisp code, puts spaces around every parenthesis and splits on whitespace. For example, it takes something like ((lambda (x) x) "Lisp"), transforms it into ( ( lambda ( x ) x ) "Lisp" ) and transforms that into ['(', '(', 'lambda', '(', 'x', ')', 'x', ')', '"Lisp"', ')'].

var tokenize = function(input) {
   return input.replace(/\(/g, ' ( ')
               .replace(/\)/g, ' ) ')
               .trim()
               .split(/\s+/);
 };

parenthesize() takes the tokens produced by tokenize() and produces a nested array that mimics the structure of the Lisp code. Each atom in the nested array is labelled as an identifier or a literal. For example, ['(', '(', 'lambda', '(', 'x', ')', 'x', ')', '"Lisp"', ')'] is transformed into:

[[{ type: 'identifier', value: 'lambda' }, [{ type: 'identifier', value: 'x' }],
  { type: 'identifier', value: 'x' }],
 { type: 'literal', value: 'Lisp' }]

parenthesize() goes through the tokens, one by one. If the current token is an opening parenthesis, it starts building a new array. If the current token is an atom, it labels it with its type and appends it to the current array. If the current token is a closing parenthesis, it stops building the current array and continues building the enclosing array.

var parenthesize = function(input, list) {
   if (list === undefined) {
     return parenthesize(input, []);
   } else {
     var token = input.shift();
     if (token === undefined) {
       return list.pop();
     } else if (token === "(") {
       list.push(parenthesize(input, []));
      return parenthesize(input, list);
    } else if (token === ")") {
      return list;
    } else {
      return parenthesize(input, list.concat(categorize(token)));
    }
  }
};

When parenthesize() is first called, the input parameter contains the array of tokens returned by tokenize(). For example:

['(', '(', 'lambda', '(', 'x', ')', 'x', ')', '"Lisp"', ')']

When parenthesize() is first called, the list parameter is undefined. Lines 2-3 run and parenthesize() recurses with list set to an empty array.

In the recursion, line 5 runs and removes the first opening parenthesis from input. Line 9 starts a new, empty list by recursing with a new, empty array.

In the recursion, line 5 runs and removes another opening parenthesis from input. Line 9 starts another new, empty list by recursing with another new, empty array.

In the recursion, input is ['lambda', '(', 'x', ')', 'x', ')', '"Lisp"', ')']. Line 14 runs with token set to lambda. It calls categorize() and passes lambda as the input argument. Line 7 of categorize() runs and returns an object with type set to identifier and value set to lambda.

var categorize = function(input) {
   if (!isNaN(parseFloat(input))) {
     return { type:'literal', value: parseFloat(input) };
   } else if (input[0] === '"' && input.slice(-1) === '"') {
     return { type:'literal', value: input.slice(1, -1) };
   } else {
     return { type:'identifier', value: input };
   }
 };

Line 14 of parenthesize() appends to list the object returned by categorize() and recurses with the rest of the input and list.

var parenthesize = function(input, list) {
   if (list === undefined) {
     return parenthesize(input, []);
   } else {
     var token = input.shift();
     if (token === undefined) {
       return list.pop();
     } else if (token === "(") {
       list.push(parenthesize(input, []));
      return parenthesize(input, list);
    } else if (token === ")") {
      return list;
    } else {
      return parenthesize(input, list.concat(categorize(token)));
    }
  }
};

In the recursion, the next token is a parenthesis. Line 9 of parenthesize() starts a new, empty list by recursing with an empty array. In the recursion, input is ['x', ')', 'x', ')', '"Lisp"', ')']. Line 14 runs with token set to x. It makes a new object with a value of x and a type of identifier. It appends this object to list and recurses.

In the recursion, the next token is a closing parenthesis. Line 12 runs and returns the completed list: [{ type: 'identifier', value: 'x' }].

parenthesize() continues recursing until it has processed all of the input tokens. It returns the nested array of typed atoms.

parse() is the successive application of tokenize() and parenthesize():

var parse = function(input) {
   return parenthesize(tokenize(input));
 };

Given a starting input of ((lambda (x) x) "Lisp"), the final output of the parser is:

[[{ type: 'identifier', value: 'lambda' }, [{ type: 'identifier', value: 'x' }],
  { type: 'identifier', value: 'x' }],
 { type: 'literal', value: 'Lisp' }]

The interpreter.

After parsing is complete, interpreting begins.

interpret() receives the output of parse() and executes it. Given the output from the parsing example above, interpret() would construct a lambda and invoke it with the argument "Lisp". The lambda invocation would return "Lisp", which would be the output of the whole program.

As well as the input to execute, interpret() receives an execution context. This is the place where variables and their values are stored. When a piece of Lisp code is executed by interpret(), the execution context contains the variables that are accessible to that code.

These variables are stored in a hierarchy. Variables in the current scope are at the bottom of the hierarchy. Variables in the enclosing scope are in the level above. Variables in the scope enclosing the enclosing scope are in the level above that. And so on. For example, in the following code:

((lambda (a)
   ((lambda (b)
     (b a))
    "b"))
  "a")

On line 3, the execution context has two active scopes. The inner lambda forms the current scope. The outer lambda forms an enclosing scope. The current scope has b bound to "b". The enclosing scope has a bound to "a". When line 3 runs, the interpreter tries to look up b in the context. It checks the current scope, finds b and returns its value. Still on line 3, the interpreter tries to look up a. It checks the current scope and does not find a, so it tries the enclosing scope. There, it finds a and returns its value.

In Little Lisp, the execution context is modeled with an object made by calling the Context constructor. This takes scope, an object that contains variables and their values in the current scope. And it takes parent. If parent is undefined, the scope is the top, or global, or outermost one.

var Context = function(scope, parent) {
   this.scope = scope;
   this.parent = parent;

   this.get = function(identifier) {
     if (identifier in this.scope) {
       return this.scope[identifier];
     } else if (this.parent !== undefined) {
       return this.parent.get(identifier);
    }
  };
};

We have seen how ((lambda (x) x) "Lisp") gets parsed. Let us see how the parsed code gets executed.

var interpret = function(input, context) {
   if (context === undefined) {
     return interpret(input, new Context(library));
   } else if (input instanceof Array) {
     return interpretList(input, context);
   } else if (input.type === "identifier") {
     return context.get(input.value);
   } else {
     return input.value;
  }
};

The first time interpret() is called, context is undefined. Lines 2-3 are run to make an execution context.

When the initial context is instantiated, the constructor function takes the library object. This contains the functions built in to the language: first, rest and print. These functions are written in JavaScript.

interpret() recurses with the original input and the new context.

input contains the full example output from the parsing section:

[[{ type: 'identifier', value: 'lambda' }, [{ type: 'identifier', value: 'x' }],
  { type: 'identifier', value: 'x' }],
 { type: 'literal', value: 'Lisp' }]

Because input is an array and context is defined, lines 4-5 are run and interpretList() is called.

var interpretList = function(input, context) {
   if (input.length > 0 && input[0].value in special) {
     return special[input[0].value](input, context);
   } else {
     var list = input.map(function(x) { return interpret(x, context); });
     if (list[0] instanceof Function) {
       return list[0].apply(undefined, list.slice(1));
     } else {
       return list;
    }
  }
};

In interpretList(), line 5 maps over the input array and calls interpret() on each element. When interpret() is called on the lambda definition, interpretList() gets called again. This time, the input argument to interpretList() is:

[{ type: 'identifier', value: 'lambda' }, [{ type: 'identifier', value: 'x' }],
  { type: 'identifier', value: 'x' }]

Line 3 of interpretList() gets called, because lambda, the first element in the array, is a special form. special.lambda() is called to create the lambda function.

var special = {
   lambda: function(input, context) {
     return function() {
       var lambdaArguments = arguments;
       var lambdaScope = input[1].reduce(function(acc, x, i) {
         acc[x.value] = lambdaArguments[i];
         return acc;
       }, {});
 
      return interpret(input[2], new Context(lambdaScope, context));
    };
  }
};

special.lambda() takes the part of the input that defines the lambda. It returns a function that, when invoked, invokes the lambda on some arguments.

Line 3 begins the definition of the lambda invocation function. Line 4 stores the arguments passed to the lambda invocation. Line 5 starts creating a new scope for the lambda’s invocation. It reduces over the part of the input that defines the parameters of the lambda: [{ type: 'identifier', value: 'x' }]. It adds a key/value pair to the lambda scope for each lambda parameter in input and argument passed to the lambda. Line 10 invokes the lambda by calling interpret() on the lambda body: { type: 'identifier', value: 'x' }. It passes in the lambda context that contains the lambda’s scope and the parent context.

The lambda is now represented by the function returned by special.lambda().

interpretList() continues mapping over the input array by calling interpret() on the second element of the list: the "Lisp" string.

var interpret = function(input, context) {
   if (context === undefined) {
     return interpret(input, new Context(library));
   } else if (input instanceof Array) {
     return interpretList(input, context);
   } else if (input.type === "identifier") {
     return context.get(input.value);
   } else {
     return input.value;
  }
};

This runs line 9 of interpret() which just returns the value attribute of the literal object: 'Lisp'. The map operation on line 5 of interpretList() is complete. list is:

[function(args) { /* code to invoke lambda */ },
 'Lisp']

Line 6 of interpretList() runs and finds that the first element of list is a JavaScript function. This means that the list is an invocation. Line 7 runs and invokes the lambda, passing the rest of list as arguments.

var interpretList = function(input, context) {
   if (input.length > 0 && input[0].value in special) {
     return special[input[0].value](input, context);
   } else {
     var list = input.map(function(x) { return interpret(x, context); });
     if (list[0] instanceof Function) {
       return list[0].apply(undefined, list.slice(1));
     } else {
       return list;
    }
  }
};

In the lambda invocation function, line 8 calls interpret() on the lambda body, { type: 'identifier', value: 'x' }.

function() {
   var lambdaArguments = arguments;
   var lambdaScope = input[1].reduce(function(acc, x, i) {
     acc[x.value] = lambdaArguments[i];
     return acc;
   }, {});

   return interpret(input[2], new Context(lambdaScope, context));
 };

Line 6 of interpret() finds that input is an identifier atom. Line 7 looks up the identifier, x, in context and returns 'Lisp'.

var interpret = function(input, context) {
   if (context === undefined) {
     return interpret(input, new Context(library));
   } else if (input instanceof Array) {
     return interpretList(input, context);
   } else if (input.type === "identifier") {
     return context.get(input.value);
   } else {
     return input.value;
  }
};

'Lisp' is returned by the lambda invocation function, which is returned by interpretList(), which is returned by interpret(), and that’s it.

Go to the GitHub repository to see all the code. And look at lis.py, the dazzlingly simple Scheme interpreter that Peter Norvig wrote in Python.