Scarface, Prince of the City and Pieter de Hooch
Brian De Palma is very good at making his scenes feel like they are part of a living world. Look at the first drug deal in Scarface. There are two shots that connect the car on the street to the rooms in the apartment where the deal is being done.
In the first shot, the camera follows Tony and Chico across the road and up the stairs to the door of the apartment.
In the second shot, the camera goes from the window of the sitting room, down into the street to the car, then back up into the apartment through the bathroom window.
But, the effect of a living world is diluted by a feeling of voyeurism. The second shot relies on the movement of the camera, rather than the movement of the objects in the scene. The camera is independent of the world.
Sidney Lumet achieves the same effect of a living world. But, his camera is mostly an observer. It is static, or it makes only unintrusive movements. It never peers. In this shot from Prince of the City, a whole block is brought alive when two running men disappear behind a building and reappear on the other side. The pan and track show us the action, but at a remove. Like the first Scarface shot, it is the movement of the characters from one place to another that brings everything to life. But, Lumet creates an even subtler effect by making the men disappear and reappear. This goes beyond connecting one visible place to another. It suggests places that are out of sight, which is more evocative of a living world.
Pieter de Hooch achieves the same effect in his painting, Paying the Hostess. Look at how light in the sky on the far left shines in through a hidden window, through a mostly obscured back room and finally falls on the wall on the far right. The light makes real the things we can’t see. It, too, is better than de Palma because it achieves its effect by making something disappear and then reappear. But, remarkably, de Hooch does it with a still image. No moving objects. No moving frame. Just moving light.
Testing from the ground up
Tests are pieces of code that check if your main code works. I write tests to catch bugs when I refactor. I write tests to force myself to think through and handle edge cases. I write tests to show the users of my project that my code does what I say it does.
For this essay, I will describe the code and tests for a tiny web app that draws a blue sky if it’s day time.
And a black sky if it’s night time.
I will describe all the code I wrote. The web app. The microscopic testing framework. The tests for the client side code. The mocks that fake layers of the web app and technologies that are not pertinent. The tests that use these mocks. The refactored code that simplifies the mocks by dividing the web app code into different pieces that have single responsibilites.
Along the way, I will talk about many fun things. Temporarily patching libraries written by other people. Writing code that pretends to be the internet. Making Ajax requests by hand. Writing a little web server. Examining a function to find out how many arguments it expects. Making asynchronous blocks of code run serially so they don’t tread on each other’s toes.
The code
To see the code from this essay in runnable form, go to the Testing From The Ground Up GitHub repository. At each evolution of the code, I will include a link to the corresponding commit.
The web app
This is the HTML that defines the only page in the web app. It has a canvas element that displays the sky. It loads the client side JavaScript, client.js
. When the DOM is ready, it calls loadTime()
, the one and only function.
<!doctype html>
<html>
<head>
<script src="client.js"></script>
</head>
<body onload="loadTime();">
<canvas id="canvas" width="600" height="100"></canvas>
</body>
</html>
Below is the client side JavaScript.
loadTime()
starts by making an Ajax request to the server to get the current time. This is done in several steps. First, it creates an XMLHttpRequest
object. Second, near the bottom of the function, it configures the object to make a GET
request to "/time.json"
. Third, it sends the request. Fourth, when the requested time data arrives at the client, the function bound to request.onload
fires.
The bound function grabs the drawing context for the canvas element. It parses "day"
or "night"
from the JSON returned by the server. If the value is "day"
, it sets the draw color to blue and draws a rectangle that takes up the whole canvas. If the value is "night"
, the color is black.
;(function(exports) {
exports.loadTime = function() {
var request = new XMLHttpRequest();
request.onload = function(data) {
var ctx = document.getElementById("canvas").getContext("2d");
var time = JSON.parse(data.target.responseText).time;
var skyColor = time === "day" ? "blue" : "black";
ctx.fillStyle = skyColor;
ctx.fillRect(0, 0, ctx.canvas.width, ctx.canvas.height);
};
request.open("GET", "/time.json", true);
request.send();
};
})(this);
Below is the server-side JavaScript. It is run in Node.js. Near the bottom, the code uses the Node HTTP module to create a web server. It specfies that every web request should be handled by requestHandler()
. Each time this function is called, Node passes it a request
object that has the URL that was requested, and a response
object that can be used to send data back to the client.
If the client requested "/"
, the root, the index.html
file is read from disk and its contents are sent back to be displayed in the user’s browser. If "/time.json"
was requested, the server looks up the time, creates a piece of JSON that looks something like { "time": "day" }
and sends it back to the user’s web browser.
var http = require("http");
var fs = require("fs");
var requestHandler = function (request, response) {
if (request.url === "/") {
fs.readFile(__dirname + "/index.html", function (err, data) {
response.writeHead(200, { "Content-Type": "text/html" });
response.end(data);
});
} else if (request.url === "/client.js") {
fs.readFile(__dirname + "/client.js", function (err, data) {
response.writeHead(200, { "Content-Type": "application/javascript" });
response.end(data);
});
} else if (request.url === "/time.json") {
response.writeHead(200, { "Content-Type": "application/json" });
var hour = new Date().getHours();
var time = hour > 6 && hour < 20 ? "day" : "night";
response.end('{ "time": "' + time + '" }');
}
};
http.createServer(requestHandler).listen(4000);
exports.requestHandler = requestHandler;
Here is the code for the basic web app.
A miniscule testing framework
It is possible to write tests that each include the generic code for reporting their outcome, for reporting errors, for calling the next test. But, that means a lot of repetition. It’s easier to use a testing framework. This is the one I wrote.
var test = function() {
gatherTests(arguments).forEach(function(userTest) {
userTest();
process.stdout.write(".");
});
console.log();
};
test.isEqual = function(a, b) {
if (a !== b) {
throw a + " and " + b + " are not equal";
}
};
var gatherTests = function(testArgs) {
return Array.prototype.slice.call(testArgs).filter(function(x, i) {
return i % 2 === 1;
});
};
module.exports = test;
test()
could be used to write a test like this:
var test = require("./test");
test(
"should test 1 equals 1", function() {
test.isEqual(1, 1);
});
Which, when run, would look like this:
$ node test.js
.
How does the testing framework run the tests it is given?
test()
takes a series of arguments that alternate between string descriptions and test functions. It throws away the ones that are strings, which are merely human-readable window-dressing. It puts the functions into an array. It walks through that array, calling each function and printing a period to indicate that the test passed.
The test functions use test.isEqual()
to assert that certain variables have certain values. (A real testing framework would have a more permissive and pragmatic version of isEqual()
that returned true in a case like test.isEqual({ love: "you" }, { love: "you" })
.) If an assertion turns out to be unfounded, an exception is thrown, an error is printed and the tests stop running.
Mocking the server and the internet
I don’t want to have to run the server when I run the tests. That would be an extra step. It would add statefulness that would need to be reset before each test. It would necessitate network communications between the test and the server.
This is the code I wrote that pretends to be the server deciding what time it is and the internet relaying that information back to the client. It does this by faking an Ajax request. It replaces the XMLHttpRequest
constructor function with a function that returns a home-made Ajax request object. This object swallows the web app’s call to open()
. When the web app calls send()
, it calls the function that the web app has bound to onload
. It passes some fake JSON that makes it seem like it is always day time.
global.XMLHttpRequest = function() {
this.open = function() {};
this.send = function() {
this.onload({
target: { responseText: '{ "time": "day" }' }
});
};
};
Mocking the canvas and the DOM
When the real code runs in a real web browser, it renders the sky to the real canvas in real blue. This is problematic. First, I don’t want to require a browser to test my code. Second, even if I capitulated to that requirement, it would be hard to check if the right thing happened. I would probably need to look at the individual pixels of the canvas drawing context to see if they were bluey.
Instead of walking down that horrid road, I wrote some code that pretends to be the browser DOM and the canvas element. It redefines getElementById()
to return a fake canvas. This has a fake getContext()
that returns a fake drawing context that has a fake fillRect()
and a fake reference to a fake canvas that has a width
and height
. Instead of drawing, this function checks that the arguments passed to it have the expected values.
global.document = {
getElementById: function() {
return {
getContext: function() {
return {
canvas: { width: 300, height: 150 },
fillRect: function(x, y, w, h) {
test.isEqual(x, 0);
test.isEqual(y, 0);
test.isEqual(w, 300);
test.isEqual(h, 150);
test.isEqual(this.fillStyle, "blue");
}
};
}
};
}
};
This is the full test.
var test = require("./test");
var client = require("../client");
test(
"should draw blue sky when it is daytime", function() {
global.XMLHttpRequest = function() {
this.open = function() {};
this.send = function() {
this.onload({
target: { responseText: '{ "time": "day" }' }
});
};
};
global.document = {
getElementById: function() {
return {
getContext: function() {
return {
canvas: { width: 300, height: 150 },
fillRect: function(x, y, w, h) {
test.isEqual(x, 0);
test.isEqual(y, 0);
test.isEqual(w, 300);
test.isEqual(h, 150);
test.isEqual(this.fillStyle, "blue");
}
};
}
};
}
};
client.loadTime();
});
Don’t worry. That code is as bad as this essay gets.
Here is the code that includes the testing framework and the first client side test.
The drawing code refactored into its own module
Those mocks are pretty horrid. They make the test very long, which discourages me from writing tests to check for other ways the code might go wrong. The mocks are horrid because the client side code is one big function that communicates with the server, asks what time it is, parses the response and draws in the canvas.
To solve this problem, I refactored the code by pulling the drawing code out into its own module, renderer
. This module includes fillBackground()
, a function that fills the canvas with the passed color. As a side benefit, the main web app code is now easier to understand and change.
;(function(exports) {
exports.loadTime = function() {
var request = new XMLHttpRequest();
request.onload = function(data) {
var time = JSON.parse(data.target.responseText).time;
var color = time === "day" ? "blue" : "black";
renderer.fillBackground(color);
};
request.open("GET", "/time.json", true);
request.send();
};
var renderer = exports.renderer = {
ctx: function() {
return document.getElementById("canvas").getContext("2d");
},
fillBackground: function(color) {
var ctx = this.ctx();
ctx.fillStyle = color;
ctx.fillRect(0, 0, ctx.canvas.width, ctx.canvas.height);
}
};
})(this);
This lets me replace the complex document
mock with a short renderer.ctx()
mock. The test becomes shorter, simpler and less brittle.
var test = require("./test");
var client = require("../client");
test(
"should draw sun and blue sky in canvas when it is daytime", function() {
global.XMLHttpRequest = function() {
this.open = function() {};
this.send = function() {
this.onload({
target: { responseText: '{ "time": "day" }' }
});
};
};
client.renderer.ctx = function() {
return {
canvas: { width: 300, height: 150 },
fillRect: function(x, y, w, h) {
test.isEqual(x, 0);
test.isEqual(y, 0);
test.isEqual(w, 300);
test.isEqual(h, 150);
test.isEqual(this.fillStyle, "blue");
}
}
};
client.loadTime();
});
Here is the code for the modularised renderer and resulting simplified client side test.
I modularised the client side code further by splitting out three more functions.
This is get()
. It makes an Ajax request to the passed URL. The Ajax request object calls the passed callback with the response.
var get = exports.get = function(url, callback) {
var request = new XMLHttpRequest();
request.onload = callback;
request.open("GET", url, true);
request.send();
};
This is getTime()
. It uses get()
to make an Ajax request to "/time.json"
and parses "day"
or "night"
from the response.
exports.getTime = function(callback) {
exports.get("/time.json", function(data) {
callback(JSON.parse(data.target.responseText).time);
});
};
This is displayTime()
. It takes a string with the value "day"
or "night"
and draws either a blue sky or a black sky.
exports.displayTime = function(time) {
var color = time === "day" ? "blue" : "black";
renderer.fillBackground(color);
};
I changed the body
tag in the HTML page. It now calls getTime()
, passing displayTime()
as the callback.
<body onload="getTime(displayTime);">
<canvas id="canvas" width="600" height="100"></canvas>
</body>
Having more modular code means that I can mock parts of the web app API, rather than mocking code written by third parties. This makes it easier to write tests that test a specific piece of functionality.
Using this refactor, I could write tests that are more extensive. The first test checks that getTime()
correctly parses JSON sent by the server. The second test checks that a call to fillBackground()
draws a rectangle at the correct position and size. The third test checks that displayTime()
draws a rectangle of the correct color for the time of day.
var test = require("./test");
var client = require("../client");
test(
"should parse time from server", function() {
client.get = function(_, callback) {
callback({ target: { responseText: '{ "time": "day" }' }});
};
client.getTime(function(time) { test.isEqual(time, "day"); });
},
"should draw rect of passed size and color when fillBackground() called", function() {
client.renderer.ctx = function() {
return {
canvas: { width: 300, height: 150 },
fillRect: function(x, y, w, h) {
test.isEqual(x, 0);
test.isEqual(y, 0);
test.isEqual(w, 300);
test.isEqual(h, 150);
}
}
};
client.renderer.fillBackground("blue");
},
"should draw blue sky when it is daytime", function() {
client.renderer.fillBackground = function(color) {
test.isEqual(color, "blue");
};
client.displayTime("day");
});
Here is the code for the more modular client code and more extensive tests.
Testing the server side code
Asynchronous tests
Some of the responsibilities of a web server require asynchronous operations. If a browser sends a request to the web app for the URL "/"
, it gets back the contents of the index.html
file. To make this happen, the file needs to be read from disk asynchronously and the contents sent to the client once they have been read. Similarly, if the browser requests the URL "/client.js"
, it gets back the contents of the client.js
file.
These are the tests I wrote to check that both cases are handled correctly.
var test = require("./test");
var requestHandler = require("../server").requestHandler;
var http = require("http");
test(
"should send index.html when / requested", function() {
http.ServerResponse.prototype.end = function(data) {
test.isEqual(data.toString().substr(0, 9), "<!doctype");
};
requestHandler({ url: "/" }, new http.ServerResponse({}));
},
"should send client.js when /client.js requested", function() {
http.ServerResponse.prototype.end = function(data) {
test.isEqual(data.toString().substr(0, 10), ";(function");
};
requestHandler({ url: "/client.js" }, new http.ServerResponse({}));
});
The output when I run these tests:
$ node server_tests.js
..
test.js:12
throw a + " and " + b + " are not equal";
^
<!doctype and ;(function are not equal
An exception is thrown, yet there are two periods indicating two passed tests. Something is wrong.
In the first test, response.end()
is mocked with a function that checks that
"<!doctype"
is sent to the user. Next, requestHandler()
is called, requesting the URL "/"
. requestHandler()
starts reading index.html
from disk. While the file is being read, the test framework presses on with its work. Uh oh. That is, the framework prints a period and starts the second test, though the response.end()
mock has not asserted the value of the response. response.end()
is re-mocked with a function that checks that ";(function"
is sent to the user. Double uh oh. requestHandler()
is called by the second test. It requests the URL "/client.js"
. requestHandler()
starts reading client.js
from disk. The framework prints another premature period.
At some point later, index.html
is read from disk. This means that the callback in requestHandler()
is called and it calls response.end()
with the contents of index.html
. Unfortunately, by this time, response.end()
has been mocked with a function expecting ";(function"
. The assertion fails.
This problem can be solved by running tests serially. A test is run. It signals it has finished. The next test is run.
This may seem pedantic. Shouldn’t tests that are testing asynchronous behaviour be able to cope with with the dangers of asynchrony? Well, yes and no. They should certainly test the asynchronous behaviours of requestHandler()
. But they should not have to cope with other tests messing about with their execution environment part way through their execution.
(It would be possible to go further and make the tests completely functionally pure. This could be done in a fundamentalist way: the test framework resets the execution context before each test. Or it could be done in a pragmatic way: each test undoes the changes it made to the execution environment. Both ways are outside the scope of this essay.)
I rewrote the testing framework to run asynchronous tests serially. Each asynchronous test binds a done()
callback parameter. It calls this when it has made all its assertions. The testing framework uses the execution of this callback as a signal to run the next test. Here are the rewritten tests.
var test = require("./test");
var requestHandler = require("../server").requestHandler;
var http = require("http");
test(
"should send index.html when / requested", function(done) {
http.ServerResponse.prototype.end = function(data) {
test.isEqual(data.toString().substr(0, 9), "<!doctype");
done();
};
requestHandler({ url: "/" }, new http.ServerResponse({}));
},
"should send client.js when /client.js requested", function(done) {
http.ServerResponse.prototype.end = function(data) {
test.isEqual(data.toString().substr(0, 10), ";(function");
done();
};
requestHandler({ url: "/client.js" }, new http.ServerResponse({}));
});
A miniscule asynchronous testing framework
Below is the code for the asynchronous testing framework.
Look at runTests()
. It takes userTests
, an array that contains the test functions to be run. If that array is empty, the tests are complete and the program exits. If it is not empty, it looks at the length
attribute of the next test function. If the attribute has the value 1
, the test expects one argument: a done()
callback. It runs testAsync()
, passing the test function and a callback that prints a period and recurses on runTests()
with the remaining test functions.
testAsync()
creates a timeout that will fire in one second. It runs the test function, passing a done()
callback for the test to run when it is complete. If the callback gets run, the timeout is cleared and testDone()
is called to indicate that the next test can run. If the done()
callback is never run by the test function, something went wrong. The timeout will fire and throw an exception, and the program will exit.
If the length
attribute of the test function has the value 0
, the function is run with testSync()
. This is the same as testAsync()
, except there is no timeout and the testDone()
callback is called as soon as the test function has completed.
var test = function() {
runTests(gatherTests(arguments));
};
var runTests = function(userTests) {
if (userTests.length === 0) {
console.log();
process.exit();
} else {
var testDone = function() {
process.stdout.write(".");
runTests(userTests.slice(1));
};
if (userTests[0].length === 1) {
testAsync(userTests[0], testDone);
} else {
testSync(userTests[0], testDone);
}
}
};
var testSync = function(userTest, testDone) {
userTest();
testDone();
};
var testAsync = function(userTest, testDone) {
var timeout = setTimeout(function() {
throw "Failed: done() never called in async test.";
}, 1000);
userTest(function() {
clearTimeout(timeout);
testDone();
});
};
test.isEqual = function(a, b) {
if (a !== b) {
throw a + " and " + b + " are not equal";
}
};
var gatherTests = function(testArgs) {
return Array.prototype.slice.call(testArgs).filter(function(x, i) {
return i % 2 === 1;
});
};
module.exports = test;
Here is the code for the asynchronous testing framework and the new server tests.
An exercise
Now that it is possible to write asynchronous tests, I can write the tests for the server. Or, rather: you can.
If you are not sure where to start, try refactoring the server so the code is more modular. Write a function that sends the passed string with the passed Content-Type
to the client. Write a function that reads a static file from disk and responds with the contents. Write a function that converts the current date into "day"
or "night"
. Write a function that takes a request for a certain URL and sends the right response.
Once your refactor is complete, or maybe while it is in progress, you can write your tests.
Summary
I wrote a simple web app. I wrote tests for it and discovered that I could mock the pieces I didn’t want to run when I ran the tests. I discovered that scary things like Ajax and the canvas API really aren’t so scary when I wrote skeletal versions of them. I realised that the mocks I had written were quite verbose. I refactored the web app code to make it more modular. This made the code better and easier to change in the future. It meant I could mock the interfaces I had written, rather than those invented by other people. This meant the mocks became simpler or unnecessary. This made it easier to write tests, so I could write more of them and test the web app more extensively.
I wrote two tests for the server. I discovered that the test framework ran them in parallel, which meant they interfered with each other. I rewrote the test framework to run tests serially. I modified the tests to signal when they were finished. I handed over the rest of the job to you.
Screencast about writing a Lisp interpreter in JavaScript
This is a recapitulation of the talk I gave at BrooklynJS in February 2014. I covered the same material at a more measured pace in this article. You can read the code on GitHub.
Moonface, live in Brooklyn
In May, I went to the barren warehouse district in Carroll Gardens in Brooklyn to see Moonface, Spencer Krug’s latest music project. I walked there from my friend Dave’s house, along wide pavements that contained nothing - no people, no benches, no parked cars, no bins, no street lights - along streets that were bordered by buildings with walls painted solid, chalky red, over a little bridge where, half way across, I could see down the canal in one of those weird long views that I used to see when I walked over the bridges in Berlin as the sun came up, one of those views you only get when you are half way across one of the numbered avenues in Manhattan, view clear for streets and streets, the sky not blocked but framed by buildings.
I arrived at the venue. Spencer Krug came on. It was just him and a piano. He sat down on the stool with a glass of what, based on the colour and volume, looked like apple juice, but was more likely to be a really very huge drink of whiskey.
He said, “I’m going to play some new songs tonight.” Most of what he played appears on his new album, “Julia With Blue Jeans On”.
The first song was about a drive he and a friend took across the US. It was about how they missed their exit because they were so engrossed in their conversation about their ex-girlfriends.
The second song had three sections. They corresponded to “Barbarian”, “Everyone is Noah, Everyone is the Ark” and “Barbarian II” on his new record. The first section was based around a persistent seesaw interval on the beautiful low notes of the piano. Spencer Krug sang, “I asked you where you want to be buried, and you asked me the name of the town where I was born.” He sang, “I am a barbarian, sometimes”, and tears came to my eyes.
The second section was, notionally, about Noah and the Ark. This section, and its relationship to the rest of the song, was an excellent summary of Spencer Krug’s entire musical catalogue. A stream of new melodies that beget more new melodies that beget more new melodies. Extended metaphors that, like Bob Dylan (“Yes, to dance beneath the diamond sky with one hand waving free, silhouetted by the sea, circled by the circus sands, with all memory and fate driven deep beneath the waves, let me forget about today until tomorrow”) combine the fantastical and the corporeal, and use the everyday details as a way to let the listener into the abstract flights of fancy: “If I am an animal I am one of the few that is self destructive. I have chewed through my beautiful muscle, I have chewed through my beautiful narrative to get out of Canada and into your door.”
But there are new elements, too. The third section of the song was a reprise. The seesaw interval and the barbarian and the tender burial wishes all came back.
The third song followed the same structure as the second: the first section introducing the themes, the second section going off into the wild, before leading into the third section’s reprise. Krug really let go at this point, bellowing, “Someone’s been writing your name all over the walls”, and singing “Love me for the way I used to be. Or, love me for the way that the skin tore open.” This song, “Love the House You’re In”, is on the album, but, strangely, this third part of the song is not.
He said, “Last night I played at, uh, Hamilton University about four and a half hours north of here in upstate New York. And there was literally a toga party going on in the same room. At one point, a woman went by in a shopping cart pushed by her friends. So, this is nicer.”
He began the fourth song by saying, “So, a lot of these are cheesy love songs and this is another one.” He ended it with a sheepish shrug of his shoulders.
He said, “It’s getting hot, better strip down”, and began taking off his shirt. Some people whooped and he said, “There’s a t-shirt under here, Ladies.” He paused. “This isn’t Hamilton University.”
The fifth song showed another new element of Spencer Krug’s music. He seems to have become a really excellent pianist. There are syncopations, trills, off-rhythm rolls that build to far more complex constructions than the medieval marching music of his keyboard playing for Sunset Rubdown and Wolf Parade. The instrumentals have elongated and become as important and complex as the parts with singing.
The seventh song was really very charming. “I’d say the only word worth singing is a name. I’d say the only name worth singing is not God, it’s you. Julia. As beautiful and simple as the sun. Julia with blue jeans on. I see you there, at the bottom of the stairs.”
The eighth and ninth songs, the last two he played, are, most regrettably, not on his album.
Before beginning the eighth song, Krug said, “This is about moving to Helsinki. It’s about a few other things too.” It includes the lines, “I didn’t know that my old life would become a swarm of swallows thinning out against the Aurora Borealis. That’s right, I said, ‘A swarm of swallows thinning out against the Aurora Borealis’.” Nine months ago, I moved to New York from England. The swarm of swallows thinning out describes perfectly the feeling of the past slipping away to leave only a future.
The ninth song was a rolling mass. For ten minutes, it got faster and more reckless and more glorious. It went, “There is a an ageing arm across a naked waist. There is a fallen tree against a perfect January snow. And that’s as spiritual as I need to be.” And, “Breakwater to the sea, just strong enough to hold back the waves as they roll in. And I should have told you sooner that I’ll be relying on you, baby. And I should have told you sooner I think you’re just like me. And I guess I should have told you I am sometimes just an arm hanging out a window on the highway in the sun. Or a fallen tree against a January snowbank. Or a break water to the sea.”
What this doesn’t convey, what lyrics never convey, is the way all this felt as the music got faster and faster and faster and faster, taking the thematic repetitions and rolling them up into something so rich it was almost unbearable.
I recorded the set on the shitty speaker on my phone. I have listened to the tinny mp3s countless times over the last six months. I have not been able to get the gig out of my head.
A practical introduction to functional programming
Many functional programming articles teach abstract functional techniques. That is, composition, pipelining, higher order functions. This one is different. It shows examples of imperative, unfunctional code that people write every day and translates these examples to a functional style.
The first section of the article takes short, data transforming loops and translates them into functional maps and reduces. The second section takes longer loops, breaks them up into units and makes each unit functional. The third section takes a loop that is a long series of successive data transformations and decomposes it into a functional pipeline.
The examples are in Python, because many people find Python easy to read. A number of the examples eschew pythonicity in order to demonstrate functional techniques common to many languages: map, reduce, pipeline. All of the examples are in Python 2.
A guide rope
When people talk about functional programming, they mention a dizzying number of “functional” characteristics. They mention immutable data1, first class functions2 and tail call optimisation3. These are language features that aid functional programming. They mention mapping, reducing, pipelining, recursing, currying4 and the use of higher order functions. These are programming techniques used to write functional code. They mention parallelization5, lazy evaluation6 and determinism7. These are advantageous properties of functional programs.
Ignore all that. Functional code is characterised by one thing: the absence of side effects. It doesn’t rely on data outside the current function, and it doesn’t change data that exists outside the current function. Every other “functional” thing can be derived from this property. Use it as a guide rope as you learn.
This is an unfunctional function:
a = 0
def increment():
global a
a += 1
This is a functional function:
def increment(a):
return a + 1
Don’t iterate over lists. Use map and reduce.
Map
Map takes a function and a collection of items. It makes a new, empty collection, runs the function on each item in the original collection and inserts each return value into the new collection. It returns the new collection.
This is a simple map that takes a list of names and returns a list of the lengths of those names:
name_lengths = map(len, ["Mary", "Isla", "Sam"])
print name_lengths
# => [4, 4, 3]
This is a map that squares every number in the passed collection:
squares = map(lambda x: x * x, [0, 1, 2, 3, 4])
print squares
# => [0, 1, 4, 9, 16]
This map doesn’t take a named function. It takes an anonymous, inlined function defined with lambda
. The parameters of the lambda are defined to the left of the colon. The function body is defined to the right of the colon. The result of running the function body is (implicitly) returned.
The unfunctional code below takes a list of real names and replaces them with randomly assigned code names.
import random
names = ['Mary', 'Isla', 'Sam']
code_names = ['Mr. Pink', 'Mr. Orange', 'Mr. Blonde']
for i in range(len(names)):
names[i] = random.choice(code_names)
print names
# => ['Mr. Blonde', 'Mr. Blonde', 'Mr. Blonde']
(As you can see, this algorithm can potentially assign the same secret code name to multiple secret agents. Hopefully, this won’t be a source of confusion during the secret mission.)
This can be rewritten as a map:
import random
names = ['Mary', 'Isla', 'Sam']
secret_names = map(lambda x: random.choice(['Mr. Pink',
'Mr. Orange',
'Mr. Blonde']),
names)
Exercise 1. Try rewriting the code below as a map. It takes a list of real names and replaces them with code names produced using a more robust strategy.
names = ['Mary', 'Isla', 'Sam']
for i in range(len(names)):
names[i] = hash(names[i])
print names
# => [6306819796133686941, 8135353348168144921, -1228887169324443034]
(Hopefully, the secret agents will have good memories and won’t forget each other’s secret code names during the secret mission.)
My solution:
names = ['Mary', 'Isla', 'Sam']
secret_names = map(hash, names)
Reduce
Reduce takes a function and a collection of items. It returns a value that is created by combining the items.
This is a simple reduce. It returns the sum of all the items in the collection.
sum = reduce(lambda a, x: a + x, [0, 1, 2, 3, 4])
print sum
# => 10
x
is the current item being iterated over. a
is the accumulator. It is the value returned by the execution of the lambda on the previous item. reduce()
walks through the items. For each one, it runs the lambda on the current a
and x
and returns the result as the a
of the next iteration.
What is a
in the first iteration? There is no previous iteration result for it to pass along. reduce()
uses the first item in the collection for a
in the first iteration and starts iterating at the second item. That is, the first x
is the second item.
This code counts how often the word 'Sam'
appears in a list of strings:
sentences = ['Mary read a story to Sam and Isla.',
'Isla cuddled Sam.',
'Sam chortled.']
sam_count = 0
for sentence in sentences:
sam_count += sentence.count('Sam')
print sam_count
# => 3
This is the same code written as a reduce:
sentences = ['Mary read a story to Sam and Isla.',
'Isla cuddled Sam.',
'Sam chortled.']
sam_count = reduce(lambda a, x: a + x.count('Sam'),
sentences,
0)
How does this code come up with its initial a
? The starting point for the number of incidences of 'Sam'
cannot be 'Mary read a story to Sam and Isla.'
The initial accumulator is specified with the third argument to reduce()
. This allows the use of a value of a different type from the items in the collection.
Why are map and reduce better?
First, they are often one-liners.
Second, the important parts of the iteration - the collection, the operation and the return value - are always in the same places in every map and reduce.
Third, the code in a loop may affect variables defined before it or code that runs after it. By convention, maps and reduces are functional.
Fourth, map and reduce are elemental operations. Every time a person reads a for
loop, they have to work through the logic line by line. There are few structural regularities they can use to create a scaffolding on which to hang their understanding of the code. In contrast, map and reduce are at once building blocks that can be combined into complex algorithms, and elements that the code reader can instantly understand and abstract in their mind. “Ah, this code is transforming each item in this collection. It’s throwing some of the transformations away. It’s combining the remainder into a single output.”
Fifth, map and reduce have many friends that provide useful, tweaked versions of their basic behaviour. For example: filter
, all
, any
and find
.
Exercise 2. Try rewriting the code below using map, reduce and filter. Filter takes a function and a collection. It returns a collection of every item for which the function returned True
.
people = [{'name': 'Mary', 'height': 160},
{'name': 'Isla', 'height': 80},
{'name': 'Sam'}]
height_total = 0
height_count = 0
for person in people:
if 'height' in person:
height_total += person['height']
height_count += 1
if height_count > 0:
average_height = height_total / height_count
print average_height
# => 120
If this seems tricky, try not thinking about the operations on the data. Think of the states the data will go through, from the list of people dictionaries to the average height. Don’t try and bundle multiple transformations together. Put each on a separate line and assign the result to a descriptively-named variable. Once the code works, condense it.
My solution:
people = [{'name': 'Mary', 'height': 160},
{'name': 'Isla', 'height': 80},
{'name': 'Sam'}]
heights = map(lambda x: x['height'],
filter(lambda x: 'height' in x, people))
if len(heights) > 0:
from operator import add
average_height = reduce(add, heights) / len(heights)
Write declaratively, not imperatively
The program below runs a race between three cars. At each time step, each car may move forwards or it may stall. At each time step, the program prints out the paths of the cars so far. After five time steps, the race is over.
This is some sample output:
-
--
--
--
--
---
---
--
---
----
---
----
----
----
-----
This is the program:
from random import random
time = 5
car_positions = [1, 1, 1]
while time:
# decrease time
time -= 1
print ''
for i in range(len(car_positions)):
# move car
if random() > 0.3:
car_positions[i] += 1
# draw car
print '-' * car_positions[i]
The code is written imperatively. A functional version would be declarative. It would describe what to do, rather than how to do it.
Use functions
A program can be made more declarative by bundling pieces of the code into functions.
from random import random
def move_cars():
for i, _ in enumerate(car_positions):
if random() > 0.3:
car_positions[i] += 1
def draw_car(car_position):
print '-' * car_position
def run_step_of_race():
global time
time -= 1
move_cars()
def draw():
print ''
for car_position in car_positions:
draw_car(car_position)
time = 5
car_positions = [1, 1, 1]
while time:
run_step_of_race()
draw()
To understand this program, the reader just reads the main loop. “If there is time left, run a step of the race and draw. Check the time again.” If the reader wants to understand more about what it means to run a step of the race, or draw, they can read the code in those functions.
There are no comments any more. The code describes itself.
Splitting code into functions is a great, low brain power way to make code more readable.
This technique uses functions, but it uses them as sub-routines. They parcel up code. The code is not functional in the sense of the guide rope. The functions in the code use state that was not passed as arguments. They affect the code around them by changing external variables, rather than by returning values. To check what a function really does, the reader must read each line carefully. If they find an external variable, they must find its origin. They must see what other functions change that variable.
Remove state
This is a functional version of the car race code:
from random import random
def move_cars(car_positions):
return map(lambda x: x + 1 if random() > 0.3 else x,
car_positions)
def output_car(car_position):
return '-' * car_position
def run_step_of_race(state):
return {'time': state['time'] - 1,
'car_positions': move_cars(state['car_positions'])}
def draw(state):
print ''
print '\n'.join(map(output_car, state['car_positions']))
def race(state):
draw(state)
if state['time']:
race(run_step_of_race(state))
race({'time': 5,
'car_positions': [1, 1, 1]})
The code is still split into functions, but the functions are functional. There are three signs of this. First, there are no longer any shared variables. time
and car_positions
get passed straight into race()
. Second, functions take parameters. Third, no variables are instantiated inside functions. All data changes are done with return values. race()
recurses3 with the result of run_step_of_race()
. Each time a step generates a new state, it is passed immediately into the next step.
Now, here are two functions, zero()
and one()
:
def zero(s):
if s[0] == "0":
return s[1:]
def one(s):
if s[0] == "1":
return s[1:]
zero()
takes a string, s
. If the first character is '0'
, it returns the rest of the string. If it is not, it returns None
, the default return value of Python functions. one()
does the same, but for a first character of '1'
.
Imagine a function called rule_sequence()
. It takes a string and a list of rule functions of the form of zero()
and one()
. It calls the first rule on the string. Unless None
is returned, it takes the return value and calls the second rule on it. Unless None
is returned, it takes the return value and calls the third rule on it. And so forth. If any rule returns None
, rule_sequence()
stops and returns None
. Otherwise, it returns the return value of the final rule.
This is some sample input and output:
print rule_sequence('0101', [zero, one, zero])
# => 1
print rule_sequence('0101', [zero, zero])
# => None
This is the imperative version of rule_sequence()
:
def rule_sequence(s, rules):
for rule in rules:
s = rule(s)
if s == None:
break
return s
Exercise 3. The code above uses a loop to do its work. Make it more declarative by rewriting it as a recursion.
My solution:
def rule_sequence(s, rules):
if s == None or not rules:
return s
else:
return rule_sequence(rules[0](s), rules[1:])
Use pipelines
In the previous section, some imperative loops were rewritten as recursions that called out to auxiliary functions. In this section, a different type of imperative loop will be rewritten using a technique called a pipeline.
The loop below performs transformations on dictionaries that hold the name, incorrect country of origin and active status of some bands.
bands = [{'name': 'sunset rubdown', 'country': 'UK', 'active': False},
{'name': 'women', 'country': 'Germany', 'active': False},
{'name': 'a silver mt. zion', 'country': 'Spain', 'active': True}]
def format_bands(bands):
for band in bands:
band['country'] = 'Canada'
band['name'] = band['name'].replace('.', '')
band['name'] = band['name'].title()
format_bands(bands)
print bands
# => [{'name': 'Sunset Rubdown', 'active': False, 'country': 'Canada'},
# {'name': 'Women', 'active': False, 'country': 'Canada' },
# {'name': 'A Silver Mt Zion', 'active': True, 'country': 'Canada'}]
Worries are stirred by the name of the function. “format” is very vague. Upon closer inspection of the code, these worries begin to claw. Three things happen in the same loop. The 'country'
key gets set to 'Canada'
. Punctuation is removed from the band name. The band name gets capitalized. It is hard to tell what the code is intended to do and hard to tell if it does what it appears to do. The code is hard to reuse, hard to test and hard to parallelize.
Compare it with this:
print pipeline_each(bands, [set_canada_as_country,
strip_punctuation_from_name,
capitalize_names])
This code is easy to understand. It gives the impression that the auxiliary functions are functional because they seem to be chained together. The output from the previous one comprises the input to the next. If they are functional, they are easy to verify. They are also easy to reuse, easy to test and easy to parallelize.
The job of pipeline_each()
is to pass the bands, one at a time, to a transformation function, like set_canada_as_country()
. After the function has been applied to all the bands, pipeline_each()
bundles up the transformed bands. Then, it passes each one to the next function.
Let’s look at the transformation functions.
def assoc(_d, key, value):
from copy import deepcopy
d = deepcopy(_d)
d[key] = value
return d
def set_canada_as_country(band):
return assoc(band, 'country', "Canada")
def strip_punctuation_from_name(band):
return assoc(band, 'name', band['name'].replace('.', ''))
def capitalize_names(band):
return assoc(band, 'name', band['name'].title())
Each one associates a key on a band with a new value. There is no easy way to do this without mutating the original band. assoc()
solves this problem by using deepcopy()
to produce a copy of the passed dictionary. Each transformation function makes its modification to the copy and returns that copy.
Everything seems fine. Band dictionary originals are protected from mutation when a key is associated with a new value. But there are two other potential mutations in the code above. In strip_punctuation_from_name()
, the unpunctuated name is generated by calling replace()
on the original name. In capitalize_names()
, the capitalized name is generated by calling title()
on the original name. If replace()
and title()
are not functional, strip_punctuation_from_name()
and capitalize_names()
are not functional.
Fortunately, replace()
and title()
do not mutate the strings they operate on. This is because strings are immutable in Python. When, for example, replace()
operates on a band name string, the original band name is copied and replace()
is called on the copy. Phew.
This contrast between the mutability of strings and dictionaries in Python illustrates the appeal of languages like Clojure. The programmer need never think about whether they are mutating data. They aren’t.
Exercise 4. Try and write the pipeline_each
function. Think about the order of operations. The bands in the array are passed, one band at a time, to the first transformation function. The bands in the resulting array are passed, one band at a time, to the second transformation function. And so forth.
My solution:
def pipeline_each(data, fns):
return reduce(lambda a, x: map(x, a),
fns,
data)
All three transformation functions boil down to making a change to a particular field on the passed band. call()
can be used to abstract that. It takes a function to apply and the key of the value to apply it to.
set_canada_as_country = call(lambda x: 'Canada', 'country')
strip_punctuation_from_name = call(lambda x: x.replace('.', ''), 'name')
capitalize_names = call(str.title, 'name')
print pipeline_each(bands, [set_canada_as_country,
strip_punctuation_from_name,
capitalize_names])
Or, if we are willing to sacrifice readability for conciseness, just:
print pipeline_each(bands, [call(lambda x: 'Canada', 'country'),
call(lambda x: x.replace('.', ''), 'name'),
call(str.title, 'name')])
The code for call()
:
def assoc(_d, key, value):
from copy import deepcopy
d = deepcopy(_d)
d[key] = value
return d
def call(fn, key):
def apply_fn(record):
return assoc(record, key, fn(record.get(key)))
return apply_fn
There is a lot going on here. Let’s take it piece by piece.
One. call()
is a higher order function. A higher order function takes a function as an argument, or returns a function. Or, like call()
, it does both.
Two. apply_fn()
looks very similar to the three transformation functions. It takes a record (a band). It looks up the value at record[key]
. It calls fn
on that value. It assigns the result back to a copy of the record. It returns the copy.
Three. call()
does not do any actual work. apply_fn()
, when called, will do the work. In the example of using pipeline_each()
above, one instance of apply_fn()
will set 'country'
to 'Canada'
on a passed band. Another instance will capitalize the name of a passed band.
Four. When an apply_fn()
instance is run, fn
and key
will not be in scope. They are neither arguments to apply_fn()
, nor locals inside it. But they will still be accessible. When a function is defined, it saves references to the variables it closes over: those that were defined in a scope outside the function and that are used inside the function. When the function is run and its code references a variable, Python looks up the variable in the locals and in the arguments. If it doesn’t find it there, it looks in the saved references to closed over variables. This is where it will find fn
and key
.
Five. There is no mention of bands in the call()
code. That is because call()
could be used to generate pipeline functions for any program, regardless of topic. Functional programming is partly about building up a library of generic, reusable, composable functions.
Good job. Closures, higher order functions and variable scope all covered in the space of a few paragraphs. Have a nice glass of lemonade.
There is one more piece of band processing to do. That is to remove everything but the name and country. extract_name_and_country()
can pull that information out:
def extract_name_and_country(band):
plucked_band = {}
plucked_band['name'] = band['name']
plucked_band['country'] = band['country']
return plucked_band
print pipeline_each(bands, [call(lambda x: 'Canada', 'country'),
call(lambda x: x.replace('.', ''), 'name'),
call(str.title, 'name'),
extract_name_and_country])
# => [{'name': 'Sunset Rubdown', 'country': 'Canada'},
# {'name': 'Women', 'country': 'Canada'},
# {'name': 'A Silver Mt Zion', 'country': 'Canada'}]
extract_name_and_country()
could have been written as a generic function called pluck()
. pluck()
would be used like this:
print pipeline_each(bands, [call(lambda x: 'Canada', 'country'),
call(lambda x: x.replace('.', ''), 'name'),
call(str.title, 'name'),
pluck(['name', 'country'])])
Exercise 5. pluck()
takes a list of keys to extract from each record. Try and write it. It will need to be a higher order function.
My solution:
def pluck(keys):
def pluck_fn(record):
return reduce(lambda a, x: assoc(a, x, record[x]),
keys,
{})
return pluck_fn
What now?
Functional code co-exists very well with code written in other styles. The transformations in this article can be applied to any code base in any language. Try applying them to your own code.
Think of Mary, Isla and Sam. Turn iterations of lists into maps and reduces.
Think of the race. Break code into functions. Make those functions functional. Turn a loop that repeats a process into a recursion.
Think of the bands. Turn a sequence of operations into a pipeline.
1 An immutable piece of data is one that cannot be changed. Some languages, like Clojure, make all values immutable by default. Any “mutating” operations copy the value, change it and pass back the changed copy. This eliminates bugs that arise from a programmer’s incomplete model of the possible states their program may enter.
2 Languages that support first class functions allow functions to be treated like any other value. This means they can be created, passed to functions, returned from functions and stored inside data structures.
3 Tail call optimisation is a programming language feature. Each time a function recurses, a new stack frame is created. A stack frame is used to store the arguments and local values for the current function invocation. If a function recurses a large number of times, it is possible for the interpreter or compiler to run out of memory. Languages with tail call optimisation reuse the same stack frame for their entire sequence of recursive calls. Languages like Python that do not have tail call optimisation generally limit the number of times a function may recurse to some number in the thousands. In the case of the race()
function, there are only five time steps, so it is safe.
4 Currying means decomposing a function that takes multiple arguments into a function that takes the first argument and returns a function that takes the next argument, and so forth for all the arguments.
5 Parallelization means running the same code concurrently without synchronization. These concurrent processes are often run on multiple processors.
6 Lazy evaluation is a compiler technique that avoids running code until the result is needed.
7 A process is deterministic if repetitions yield the same result every time.
Little Lisp interpreter
Little Lisp is an interpreter that supports function invocation, lambdas, lets, ifs, numbers, strings, a few library functions, and lists. I wrote it for a lightning talk at the Recurse Center to show how easy it is to write an interpreter. The code is 116 lines of JavaScript. I will explain how it works.
First, let’s learn some Lisp.
An atom, the simplest Lisp form:
1
Another atom. This time, a string:
"a"
An empty list:
()
A list containing an atom:
(1)
A list containing two atoms:
(1 2)
A list containing an atom and another list:
(1 (2))
A function invocation. This comprises a list where the first element is the function and the rest of the elements are the arguments. first
takes one argument, (1 2)
, and returns 1
.
(first (1 2))
=> 1
A lambda. That is: a function definition. The lambda takes a parameter, x
, and just returns it.
(lambda (x)
x)
A lambda invocation. This comprises a list where the first element is a lambda and the rest of the elements are the arguments. The lambda takes one argument, "Lisp"
, and returns it.
((lambda (x)
x)
"Lisp")
=> "Lisp"
Writing a Lisp interpreter is really easy.
The code for Little Lisp has two parts: the parser and the interpreter.
The parser.
Parsing has two phases: tokenizing and parenthesizing.
tokenize()
takes a string of Lisp code, puts spaces around every parenthesis and splits on whitespace. For example, it takes something like ((lambda (x) x) "Lisp")
, transforms it into ` ( ( lambda ( x ) x ) “Lisp” ) ` and transforms that into ['(', '(', 'lambda', '(', 'x', ')', 'x', ')', '"Lisp"', ')']
.
var tokenize = function(input) {
return input.replace(/\(/g, ' ( ')
.replace(/\)/g, ' ) ')
.trim()
.split(/\s+/);
};
parenthesize()
takes the tokens produced by tokenize()
and produces a nested array that mimics the structure of the Lisp code. Each atom in the nested array is labelled as an identifier or a literal. For example, ['(', '(', 'lambda', '(', 'x', ')', 'x', ')', '"Lisp"', ')']
is transformed into:
[[{ type: 'identifier', value: 'lambda' }, [{ type: 'identifier', value: 'x' }],
{ type: 'identifier', value: 'x' }],
{ type: 'literal', value: 'Lisp' }]
parenthesize()
goes through the tokens, one by one. If the current token is an opening parenthesis, it starts building a new array. If the current token is an atom, it labels it with its type and appends it to the current array. If the current token is a closing parenthesis, it stops building the current array and continues building the enclosing array.
var parenthesize = function(input, list) {
if (list === undefined) {
return parenthesize(input, []);
} else {
var token = input.shift();
if (token === undefined) {
return list.pop();
} else if (token === "(") {
list.push(parenthesize(input, []));
return parenthesize(input, list);
} else if (token === ")") {
return list;
} else {
return parenthesize(input, list.concat(categorize(token)));
}
}
};
When parenthesize()
is first called, the input
parameter contains the array of tokens returned by tokenize()
. For example:
['(', '(', 'lambda', '(', 'x', ')', 'x', ')', '"Lisp"', ')']
When parenthesize()
is first called, the list
parameter is undefined. Lines 2-3 run and parenthesize()
recurses with list
set to an empty array.
In the recursion, line 5 runs and removes the first opening parenthesis from input
. Line 9 starts a new, empty list by recursing with a new, empty array.
In the recursion, line 5 runs and removes another opening parenthesis from input
. Line 9 starts another new, empty list by recursing with another new, empty array.
In the recursion, input
is ['lambda', '(', 'x', ')', 'x', ')', '"Lisp"', ')']
. Line 14 runs with token
set to lambda
. It calls categorize()
and passes lambda
as the input
argument. Line 7 of categorize()
runs and returns an object with type
set to identifier
and value
set to lambda
.
var categorize = function(input) {
if (!isNaN(parseFloat(input))) {
return { type:'literal', value: parseFloat(input) };
} else if (input[0] === '"' && input.slice(-1) === '"') {
return { type:'literal', value: input.slice(1, -1) };
} else {
return { type:'identifier', value: input };
}
};
Line 14 of parenthesize()
appends to list
the object returned by categorize()
and recurses with the rest of the input and list
.
var parenthesize = function(input, list) {
if (list === undefined) {
return parenthesize(input, []);
} else {
var token = input.shift();
if (token === undefined) {
return list.pop();
} else if (token === "(") {
list.push(parenthesize(input, []));
return parenthesize(input, list);
} else if (token === ")") {
return list;
} else {
return parenthesize(input, list.concat(categorize(token)));
}
}
};
In the recursion, the next token is a parenthesis. Line 9 of parenthesize()
starts a new, empty list by recursing with an empty array. In the recursion, input
is ['x', ')', 'x', ')', '"Lisp"', ')']
. Line 14 runs with token
set to x
. It makes a new object with a value of x
and a type of identifier
. It appends this object to list
and recurses.
In the recursion, the next token is a closing parenthesis. Line 12 runs and returns the completed list
: [{ type: 'identifier', value: 'x' }]
.
parenthesize()
continues recursing until it has processed all of the input tokens. It returns the nested array of typed atoms.
parse()
is the successive application of tokenize()
and parenthesize()
:
var parse = function(input) {
return parenthesize(tokenize(input));
};
Given a starting input of ((lambda (x) x) "Lisp")
, the final output of the parser is:
[[{ type: 'identifier', value: 'lambda' }, [{ type: 'identifier', value: 'x' }],
{ type: 'identifier', value: 'x' }],
{ type: 'literal', value: 'Lisp' }]
The interpreter.
After parsing is complete, interpreting begins.
interpret()
receives the output of parse()
and executes it. Given the output from the parsing example above, interpret()
would construct a lambda and invoke it with the argument "Lisp"
. The lambda invocation would return "Lisp"
, which would be the output of the whole program.
As well as the input to execute, interpret()
receives an execution context. This is the place where variables and their values are stored. When a piece of Lisp code is executed by interpret()
, the execution context contains the variables that are accessible to that code.
These variables are stored in a hierarchy. Variables in the current scope are at the bottom of the hierarchy. Variables in the enclosing scope are in the level above. Variables in the scope enclosing the enclosing scope are in the level above that. And so on. For example, in the following code:
((lambda (a)
((lambda (b)
(b a))
"b"))
"a")
On line 3, the execution context has two active scopes. The inner lambda forms the current scope. The outer lambda forms an enclosing scope. The current scope has b
bound to "b"
. The enclosing scope has a
bound to "a"
. When line 3 runs, the interpreter tries to look up b
in the context. It checks the current scope, finds b
and returns its value. Still on line 3, the interpreter tries to look up a
. It checks the current scope and does not find a
, so it tries the enclosing scope. There, it finds a
and returns its value.
In Little Lisp, the execution context is modeled with an object made by calling the Context
constructor. This takes scope
, an object that contains variables and their values in the current scope. And it takes parent
. If parent
is undefined
, the scope is the top, or global, or outermost one.
var Context = function(scope, parent) {
this.scope = scope;
this.parent = parent;
this.get = function(identifier) {
if (identifier in this.scope) {
return this.scope[identifier];
} else if (this.parent !== undefined) {
return this.parent.get(identifier);
}
};
};
We have seen how ((lambda (x) x) "Lisp")
gets parsed. Let us see how the parsed code gets executed.
var interpret = function(input, context) {
if (context === undefined) {
return interpret(input, new Context(library));
} else if (input instanceof Array) {
return interpretList(input, context);
} else if (input.type === "identifier") {
return context.get(input.value);
} else {
return input.value;
}
};
The first time interpret()
is called, context
is undefined
. Lines 2-3 are run to make an execution context.
When the initial context is instantiated, the constructor function takes the library
object. This contains the functions built in to the language: first
, rest
and print
. These functions are written in JavaScript.
interpret()
recurses with the original input
and the new context
.
input
contains the full example output from the parsing section:
[[{ type: 'identifier', value: 'lambda' }, [{ type: 'identifier', value: 'x' }],
{ type: 'identifier', value: 'x' }],
{ type: 'literal', value: 'Lisp' }]
Because input
is an array and context
is defined, lines 4-5 are run and interpretList()
is called.
var interpretList = function(input, context) {
if (input.length > 0 && input[0].value in special) {
return special[input[0].value](input, context);
} else {
var list = input.map(function(x) { return interpret(x, context); });
if (list[0] instanceof Function) {
return list[0].apply(undefined, list.slice(1));
} else {
return list;
}
}
};
In interpretList()
, line 5 maps over the input array and calls interpret()
on each element. When interpret()
is called on the lambda definition, interpretList()
gets called again. This time, the input
argument to interpretList()
is:
[{ type: 'identifier', value: 'lambda' }, [{ type: 'identifier', value: 'x' }],
{ type: 'identifier', value: 'x' }]
Line 3 of interpretList()
gets called, because lambda
, the first element in the array, is a special form. special.lambda()
is called to create the lambda function.
var special = {
lambda: function(input, context) {
return function() {
var lambdaArguments = arguments;
var lambdaScope = input[1].reduce(function(acc, x, i) {
acc[x.value] = lambdaArguments[i];
return acc;
}, {});
return interpret(input[2], new Context(lambdaScope, context));
};
}
};
special.lambda()
takes the part of the input that defines the lambda. It returns a function that, when invoked, invokes the lambda on some arguments.
Line 3 begins the definition of the lambda invocation function. Line 4 stores the arguments passed to the lambda invocation. Line 5 starts creating a new scope for the lambda’s invocation. It reduces over the part of the input that defines the parameters of the lambda: [{ type: 'identifier', value: 'x' }]
. It adds a key/value pair to the lambda scope for each lambda parameter in input
and argument passed to the lambda. Line 10 invokes the lambda by calling interpret()
on the lambda body: { type: 'identifier', value: 'x' }
. It passes in the lambda context that contains the lambda’s scope and the parent context.
The lambda is now represented by the function returned by special.lambda()
.
interpretList()
continues mapping over the input
array by calling interpret()
on the second element of the list: the "Lisp"
string.
var interpret = function(input, context) {
if (context === undefined) {
return interpret(input, new Context(library));
} else if (input instanceof Array) {
return interpretList(input, context);
} else if (input.type === "identifier") {
return context.get(input.value);
} else {
return input.value;
}
};
This runs line 9 of interpret()
which just returns the value
attribute of the literal object: 'Lisp'
. The map operation on line 5 of interpretList()
is complete. list
is:
[function(args) { /* code to invoke lambda */ },
'Lisp']
Line 6 of interpretList()
runs and finds that the first element of list
is a JavaScript function. This means that the list is an invocation. Line 7 runs and invokes the lambda, passing the rest of list
as arguments.
var interpretList = function(input, context) {
if (input.length > 0 && input[0].value in special) {
return special[input[0].value](input, context);
} else {
var list = input.map(function(x) { return interpret(x, context); });
if (list[0] instanceof Function) {
return list[0].apply(undefined, list.slice(1));
} else {
return list;
}
}
};
In the lambda invocation function, line 8 calls interpret()
on the lambda body, { type: 'identifier', value: 'x' }
.
function() {
var lambdaArguments = arguments;
var lambdaScope = input[1].reduce(function(acc, x, i) {
acc[x.value] = lambdaArguments[i];
return acc;
}, {});
return interpret(input[2], new Context(lambdaScope, context));
};
Line 6 of interpret()
finds that input
is an identifier atom. Line 7 looks up the identifier, x
, in context
and returns 'Lisp'
.
var interpret = function(input, context) {
if (context === undefined) {
return interpret(input, new Context(library));
} else if (input instanceof Array) {
return interpretList(input, context);
} else if (input.type === "identifier") {
return context.get(input.value);
} else {
return input.value;
}
};
'Lisp'
is returned by the lambda invocation function, which is returned by interpretList()
, which is returned by interpret()
, and that’s it.
Go to the GitHub repository to see all the code. And look at lis.py, the dazzlingly simple Scheme interpreter that Peter Norvig wrote in Python.
Coquette
I’ve just released Coquette, a micro framework for JavaScript games. It handles collision detection, the game update loop, keyboard input and canvas rendering. I wrote it in a panic when, two hours before the deadline of April’s Ludum Dare, I discovered that entries cannot use closed source library code.
A pop quiz purporting to be about scope in JavaScript that is actually a polemic about why modules are a good idea
This is the blog post version of a lightning talk I gave at the Recurse Center last week.
I will ask you some questions. For each question, there will be some JavaScript files loaded into an HTML page. In these JavaScript files, there will be some console.log()
invocations. Your job is to predict what happens after each invocation.
For this first question, there is one JavaScript file, a.js
, loaded into this HTML page:
<!doctype html>
<html>
<head>
<script src="a.js"></script>
</head>
</html>
One
// a.js var blah = "blah"; console.log(blah); // Option 1: ReferenceError: blah is not defined // Option 2: print 'blah' to the console
blah
will be printed. The variable, blah
, is in the same scope as the console.log()
invocation.
For this rest of the questions, there will be two JavaScript files, a.js
and b.js
, loaded into this HTML page:
<!doctype html>
<html>
<head>
<script src="a.js"></script>
<script src="b.js"></script>
</head>
</html>
Two
// a.js
blah = "blah";
// b.js
console.log(blah); // Option 1: ReferenceError: blah is not defined
// Option 2: print 'blah' to the console
blah
will be printed. It doesn’t matter that blah
was declared in a different file. All files in JavaScript are loaded into a shared scope.
Three
// a.js
var blah = "blah";
// b.js
console.log(blah); // Option 1: ReferenceError: blah is not defined
// Option 2: print 'blah' to the console
blah
will be printed. It doesn’t matter that the var
keyword was used to declare blah
. The declaration was in a shared scope, and that scope is global, so blah
becomes a global variable.
Four
// a.js
var blahFn = function() {
return "blah";
};
// b.js
console.log(blahFn()); // Option 1: ReferenceError: blah is not defined
// Option 2: print 'blah' to the console
blah
will be printed. All the logic shown so far applies to functions.
Five
// a.js
function blahFn() {
var blah = "blah";
};
// b.js
blahFn();
console.log(blah); // Option 1: ReferenceError: blah is not defined
// Option 2: print 'blah' to the console
There will be a reference error. The blah
variable is created inside a function, in the function’s own, private scope. It is not available in the global scope.
Six
// a.js
function blahFn() {
blah = "blah";
};
// b.js
blahFn();
console.log(blah); // Option 1: ReferenceError: blah is not defined
// Option 2: print 'blah' to the console
blah
will be printed. With the var keyword omitted, the blah
variable goes into the global scope.
Seven
// a.js
;(function() {
function blahFn() {
return "blah";
};
console.log(blahFn()); // Option 1: ReferenceError: blahFn is not defined
// Option 2: print 'blah' to the console
})();
a.js
is loaded. The anonymous function at the top is invoked, creating our first JavaScript module. The module instantiates blahFn
. The console.log()
invocation prints blah
because the code inside a module has a shared scope.
// b.js
console.log(blahFn()); // Option 1: ReferenceError: blahFn is not defined
// Option 2: print 'blah' to the console
When console.log()
is invoked in b.js
, a ReferenceError is thrown. The variables inside the module in a.js
are locked away. The global scope is not polluted.
Eight
// a.js
;(function(exports) {
var Blah = {};
Blah.blahFn = function() {
return "blah";
};
exports.Blah = Blah;
})(this);
// b.js
console.log(Blah.blahFn()); // Option 1: ReferenceError: Blah is not defined
// Option 2: print 'blah' to the console
When the console.log()
is invoked, blah
is printed. The module has made itself available as an object in the global scope. It has also made available one of its variables on this object.
To do this, it made an empty object, Blah
, and attached blahFn
to it as an attribute. The anonymous function enclosing the module was passed this
. Because the anonymous function was invoked in the global context, this
is the global object, also referenced with window
. The module attached Blah
to the exports/global/window
object, making it available to the console.log()
invocation.
Nine
// a.js
(function woo() {
return function() {
console.log("woo")
};
})
// b.js
(function() {
console.log("boo")
})();
What is printed? woo
or boo
?
It’s boo
. The code in a.js
makes a function called woo
and does nothing with it.
In questions seven and eight, both modules were preceded by a semicolon. Why is this?
Often, a module is put in its own JavaScript file. Often, JavaScript files that are part of a website are joined together. This is part of a process to reduce download time. If you join the a.js
and b.js
together and then execute that, woo
will be printed. The operation of the boo module was changed by the random code preceding it in the a.js
file. Because the random code in a.js
had no semicolon at the end, the parentheses around the boo module invoked the woo
function. This returned the anonymous function inside the woo
function which was then invoked by the parentheses sitting at the end of b.js
.
Modules protect themselves from these sorts of nightmares with a leading semicolon.