Screencast about writing a Lisp interpreter in JavaScript
This is a recapitulation of the talk I gave at BrooklynJS in February 2014. I covered the same material at a more measured pace in this article. You can read the code on GitHub.
Moonface, live in Brooklyn
In May, I went to the barren warehouse district in Carroll Gardens in Brooklyn to see Moonface, Spencer Krug’s latest music project. I walked there from my friend Dave’s house, along wide pavements that contained nothing - no people, no benches, no parked cars, no bins, no street lights - along streets that were bordered by buildings with walls painted solid, chalky red, over a little bridge where, half way across, I could see down the canal in one of those weird long views that I used to see when I walked over the bridges in Berlin as the sun came up, one of those views you only get when you are half way across one of the numbered avenues in Manhattan, view clear for streets and streets, the sky not blocked but framed by buildings.
I arrived at the venue. Spencer Krug came on. It was just him and a piano. He sat down on the stool with a glass of what, based on the colour and volume, looked like apple juice, but was more likely to be a really very huge drink of whiskey.
He said, “I’m going to play some new songs tonight.” Most of what he played appears on his new album, “Julia With Blue Jeans On”.
The first song was about a drive he and a friend took across the US. It was about how they missed their exit because they were so engrossed in their conversation about their ex-girlfriends.
The second song had three sections. They corresponded to “Barbarian”, “Everyone is Noah, Everyone is the Ark” and “Barbarian II” on his new record. The first section was based around a persistent seesaw interval on the beautiful low notes of the piano. Spencer Krug sang, “I asked you where you want to be buried, and you asked me the name of the town where I was born.” He sang, “I am a barbarian, sometimes”, and tears came to my eyes.
The second section was, notionally, about Noah and the Ark. This section, and its relationship to the rest of the song, was an excellent summary of Spencer Krug’s entire musical catalogue. A stream of new melodies that beget more new melodies that beget more new melodies. Extended metaphors that, like Bob Dylan (“Yes, to dance beneath the diamond sky with one hand waving free, silhouetted by the sea, circled by the circus sands, with all memory and fate driven deep beneath the waves, let me forget about today until tomorrow”) combine the fantastical and the corporeal, and use the everyday details as a way to let the listener into the abstract flights of fancy: “If I am an animal I am one of the few that is self destructive. I have chewed through my beautiful muscle, I have chewed through my beautiful narrative to get out of Canada and into your door.”
But there are new elements, too. The third section of the song was a reprise. The seesaw interval and the barbarian and the tender burial wishes all came back.
The third song followed the same structure as the second: the first section introducing the themes, the second section going off into the wild, before leading into the third section’s reprise. Krug really let go at this point, bellowing, “Someone’s been writing your name all over the walls”, and singing “Love me for the way I used to be. Or, love me for the way that the skin tore open.” This song, “Love the House You’re In”, is on the album, but, strangely, this third part of the song is not.
He said, “Last night I played at, uh, Hamilton University about four and a half hours north of here in upstate New York. And there was literally a toga party going on in the same room. At one point, a woman went by in a shopping cart pushed by her friends. So, this is nicer.”
He began the fourth song by saying, “So, a lot of these are cheesy love songs and this is another one.” He ended it with a sheepish shrug of his shoulders.
He said, “It’s getting hot, better strip down”, and began taking off his shirt. Some people whooped and he said, “There’s a t-shirt under here, Ladies.” He paused. “This isn’t Hamilton University.”
The fifth song showed another new element of Spencer Krug’s music. He seems to have become a really excellent pianist. There are syncopations, trills, off-rhythm rolls that build to far more complex constructions than the medieval marching music of his keyboard playing for Sunset Rubdown and Wolf Parade. The instrumentals have elongated and become as important and complex as the parts with singing.
The seventh song was really very charming. “I’d say the only word worth singing is a name. I’d say the only name worth singing is not God, it’s you. Julia. As beautiful and simple as the sun. Julia with blue jeans on. I see you there, at the bottom of the stairs.”
The eighth and ninth songs, the last two he played, are, most regrettably, not on his album.
Before beginning the eighth song, Krug said, “This is about moving to Helsinki. It’s about a few other things too.” It includes the lines, “I didn’t know that my old life would become a swarm of swallows thinning out against the Aurora Borealis. That’s right, I said, ‘A swarm of swallows thinning out against the Aurora Borealis’.” Nine months ago, I moved to New York from England. The swarm of swallows thinning out describes perfectly the feeling of the past slipping away to leave only a future.
The ninth song was a rolling mass. For ten minutes, it got faster and more reckless and more glorious. It went, “There is a an ageing arm across a naked waist. There is a fallen tree against a perfect January snow. And that’s as spiritual as I need to be.” And, “Breakwater to the sea, just strong enough to hold back the waves as they roll in. And I should have told you sooner that I’ll be relying on you, baby. And I should have told you sooner I think you’re just like me. And I guess I should have told you I am sometimes just an arm hanging out a window on the highway in the sun. Or a fallen tree against a January snowbank. Or a break water to the sea.”
What this doesn’t convey, what lyrics never convey, is the way all this felt as the music got faster and faster and faster and faster, taking the thematic repetitions and rolling them up into something so rich it was almost unbearable.
I recorded the set on the shitty speaker on my phone. I have listened to the tinny mp3s countless times over the last six months. I have not been able to get the gig out of my head.
A practical introduction to functional programming
Many functional programming articles teach abstract functional techniques. That is, composition, pipelining, higher order functions. This one is different. It shows examples of imperative, unfunctional code that people write every day and translates these examples to a functional style.
The first section of the article takes short, data transforming loops and translates them into functional maps and reduces. The second section takes longer loops, breaks them up into units and makes each unit functional. The third section takes a loop that is a long series of successive data transformations and decomposes it into a functional pipeline.
The examples are in Python, because many people find Python easy to read. A number of the examples eschew pythonicity in order to demonstrate functional techniques common to many languages: map, reduce, pipeline. All of the examples are in Python 2.
A guide rope
When people talk about functional programming, they mention a dizzying number of “functional” characteristics. They mention immutable data1, first class functions2 and tail call optimisation3. These are language features that aid functional programming. They mention mapping, reducing, pipelining, recursing, currying4 and the use of higher order functions. These are programming techniques used to write functional code. They mention parallelization5, lazy evaluation6 and determinism7. These are advantageous properties of functional programs.
Ignore all that. Functional code is characterised by one thing: the absence of side effects. It doesn’t rely on data outside the current function, and it doesn’t change data that exists outside the current function. Every other “functional” thing can be derived from this property. Use it as a guide rope as you learn.
This is an unfunctional function:
a = 0
def increment():
global a
a += 1
This is a functional function:
def increment(a):
return a + 1
Don’t iterate over lists. Use map and reduce.
Map
Map takes a function and a collection of items. It makes a new, empty collection, runs the function on each item in the original collection and inserts each return value into the new collection. It returns the new collection.
This is a simple map that takes a list of names and returns a list of the lengths of those names:
name_lengths = map(len, ["Mary", "Isla", "Sam"])
print name_lengths
# => [4, 4, 3]
This is a map that squares every number in the passed collection:
squares = map(lambda x: x * x, [0, 1, 2, 3, 4])
print squares
# => [0, 1, 4, 9, 16]
This map doesn’t take a named function. It takes an anonymous, inlined function defined with lambda
. The parameters of the lambda are defined to the left of the colon. The function body is defined to the right of the colon. The result of running the function body is (implicitly) returned.
The unfunctional code below takes a list of real names and replaces them with randomly assigned code names.
import random
names = ['Mary', 'Isla', 'Sam']
code_names = ['Mr. Pink', 'Mr. Orange', 'Mr. Blonde']
for i in range(len(names)):
names[i] = random.choice(code_names)
print names
# => ['Mr. Blonde', 'Mr. Blonde', 'Mr. Blonde']
(As you can see, this algorithm can potentially assign the same secret code name to multiple secret agents. Hopefully, this won’t be a source of confusion during the secret mission.)
This can be rewritten as a map:
import random
names = ['Mary', 'Isla', 'Sam']
secret_names = map(lambda x: random.choice(['Mr. Pink',
'Mr. Orange',
'Mr. Blonde']),
names)
Exercise 1. Try rewriting the code below as a map. It takes a list of real names and replaces them with code names produced using a more robust strategy.
names = ['Mary', 'Isla', 'Sam']
for i in range(len(names)):
names[i] = hash(names[i])
print names
# => [6306819796133686941, 8135353348168144921, -1228887169324443034]
(Hopefully, the secret agents will have good memories and won’t forget each other’s secret code names during the secret mission.)
My solution:
names = ['Mary', 'Isla', 'Sam']
secret_names = map(hash, names)
Reduce
Reduce takes a function and a collection of items. It returns a value that is created by combining the items.
This is a simple reduce. It returns the sum of all the items in the collection.
sum = reduce(lambda a, x: a + x, [0, 1, 2, 3, 4])
print sum
# => 10
x
is the current item being iterated over. a
is the accumulator. It is the value returned by the execution of the lambda on the previous item. reduce()
walks through the items. For each one, it runs the lambda on the current a
and x
and returns the result as the a
of the next iteration.
What is a
in the first iteration? There is no previous iteration result for it to pass along. reduce()
uses the first item in the collection for a
in the first iteration and starts iterating at the second item. That is, the first x
is the second item.
This code counts how often the word 'Sam'
appears in a list of strings:
sentences = ['Mary read a story to Sam and Isla.',
'Isla cuddled Sam.',
'Sam chortled.']
sam_count = 0
for sentence in sentences:
sam_count += sentence.count('Sam')
print sam_count
# => 3
This is the same code written as a reduce:
sentences = ['Mary read a story to Sam and Isla.',
'Isla cuddled Sam.',
'Sam chortled.']
sam_count = reduce(lambda a, x: a + x.count('Sam'),
sentences,
0)
How does this code come up with its initial a
? The starting point for the number of incidences of 'Sam'
cannot be 'Mary read a story to Sam and Isla.'
The initial accumulator is specified with the third argument to reduce()
. This allows the use of a value of a different type from the items in the collection.
Why are map and reduce better?
First, they are often one-liners.
Second, the important parts of the iteration - the collection, the operation and the return value - are always in the same places in every map and reduce.
Third, the code in a loop may affect variables defined before it or code that runs after it. By convention, maps and reduces are functional.
Fourth, map and reduce are elemental operations. Every time a person reads a for
loop, they have to work through the logic line by line. There are few structural regularities they can use to create a scaffolding on which to hang their understanding of the code. In contrast, map and reduce are at once building blocks that can be combined into complex algorithms, and elements that the code reader can instantly understand and abstract in their mind. “Ah, this code is transforming each item in this collection. It’s throwing some of the transformations away. It’s combining the remainder into a single output.”
Fifth, map and reduce have many friends that provide useful, tweaked versions of their basic behaviour. For example: filter
, all
, any
and find
.
Exercise 2. Try rewriting the code below using map, reduce and filter. Filter takes a function and a collection. It returns a collection of every item for which the function returned True
.
people = [{'name': 'Mary', 'height': 160},
{'name': 'Isla', 'height': 80},
{'name': 'Sam'}]
height_total = 0
height_count = 0
for person in people:
if 'height' in person:
height_total += person['height']
height_count += 1
if height_count > 0:
average_height = height_total / height_count
print average_height
# => 120
If this seems tricky, try not thinking about the operations on the data. Think of the states the data will go through, from the list of people dictionaries to the average height. Don’t try and bundle multiple transformations together. Put each on a separate line and assign the result to a descriptively-named variable. Once the code works, condense it.
My solution:
people = [{'name': 'Mary', 'height': 160},
{'name': 'Isla', 'height': 80},
{'name': 'Sam'}]
heights = map(lambda x: x['height'],
filter(lambda x: 'height' in x, people))
if len(heights) > 0:
from operator import add
average_height = reduce(add, heights) / len(heights)
Write declaratively, not imperatively
The program below runs a race between three cars. At each time step, each car may move forwards or it may stall. At each time step, the program prints out the paths of the cars so far. After five time steps, the race is over.
This is some sample output:
-
--
--
--
--
---
---
--
---
----
---
----
----
----
-----
This is the program:
from random import random
time = 5
car_positions = [1, 1, 1]
while time:
# decrease time
time -= 1
print ''
for i in range(len(car_positions)):
# move car
if random() > 0.3:
car_positions[i] += 1
# draw car
print '-' * car_positions[i]
The code is written imperatively. A functional version would be declarative. It would describe what to do, rather than how to do it.
Use functions
A program can be made more declarative by bundling pieces of the code into functions.
from random import random
def move_cars():
for i, _ in enumerate(car_positions):
if random() > 0.3:
car_positions[i] += 1
def draw_car(car_position):
print '-' * car_position
def run_step_of_race():
global time
time -= 1
move_cars()
def draw():
print ''
for car_position in car_positions:
draw_car(car_position)
time = 5
car_positions = [1, 1, 1]
while time:
run_step_of_race()
draw()
To understand this program, the reader just reads the main loop. “If there is time left, run a step of the race and draw. Check the time again.” If the reader wants to understand more about what it means to run a step of the race, or draw, they can read the code in those functions.
There are no comments any more. The code describes itself.
Splitting code into functions is a great, low brain power way to make code more readable.
This technique uses functions, but it uses them as sub-routines. They parcel up code. The code is not functional in the sense of the guide rope. The functions in the code use state that was not passed as arguments. They affect the code around them by changing external variables, rather than by returning values. To check what a function really does, the reader must read each line carefully. If they find an external variable, they must find its origin. They must see what other functions change that variable.
Remove state
This is a functional version of the car race code:
from random import random
def move_cars(car_positions):
return map(lambda x: x + 1 if random() > 0.3 else x,
car_positions)
def output_car(car_position):
return '-' * car_position
def run_step_of_race(state):
return {'time': state['time'] - 1,
'car_positions': move_cars(state['car_positions'])}
def draw(state):
print ''
print '\n'.join(map(output_car, state['car_positions']))
def race(state):
draw(state)
if state['time']:
race(run_step_of_race(state))
race({'time': 5,
'car_positions': [1, 1, 1]})
The code is still split into functions, but the functions are functional. There are three signs of this. First, there are no longer any shared variables. time
and car_positions
get passed straight into race()
. Second, functions take parameters. Third, no variables are instantiated inside functions. All data changes are done with return values. race()
recurses3 with the result of run_step_of_race()
. Each time a step generates a new state, it is passed immediately into the next step.
Now, here are two functions, zero()
and one()
:
def zero(s):
if s[0] == "0":
return s[1:]
def one(s):
if s[0] == "1":
return s[1:]
zero()
takes a string, s
. If the first character is '0'
, it returns the rest of the string. If it is not, it returns None
, the default return value of Python functions. one()
does the same, but for a first character of '1'
.
Imagine a function called rule_sequence()
. It takes a string and a list of rule functions of the form of zero()
and one()
. It calls the first rule on the string. Unless None
is returned, it takes the return value and calls the second rule on it. Unless None
is returned, it takes the return value and calls the third rule on it. And so forth. If any rule returns None
, rule_sequence()
stops and returns None
. Otherwise, it returns the return value of the final rule.
This is some sample input and output:
print rule_sequence('0101', [zero, one, zero])
# => 1
print rule_sequence('0101', [zero, zero])
# => None
This is the imperative version of rule_sequence()
:
def rule_sequence(s, rules):
for rule in rules:
s = rule(s)
if s == None:
break
return s
Exercise 3. The code above uses a loop to do its work. Make it more declarative by rewriting it as a recursion.
My solution:
def rule_sequence(s, rules):
if s == None or not rules:
return s
else:
return rule_sequence(rules[0](s), rules[1:])
Use pipelines
In the previous section, some imperative loops were rewritten as recursions that called out to auxiliary functions. In this section, a different type of imperative loop will be rewritten using a technique called a pipeline.
The loop below performs transformations on dictionaries that hold the name, incorrect country of origin and active status of some bands.
bands = [{'name': 'sunset rubdown', 'country': 'UK', 'active': False},
{'name': 'women', 'country': 'Germany', 'active': False},
{'name': 'a silver mt. zion', 'country': 'Spain', 'active': True}]
def format_bands(bands):
for band in bands:
band['country'] = 'Canada'
band['name'] = band['name'].replace('.', '')
band['name'] = band['name'].title()
format_bands(bands)
print bands
# => [{'name': 'Sunset Rubdown', 'active': False, 'country': 'Canada'},
# {'name': 'Women', 'active': False, 'country': 'Canada' },
# {'name': 'A Silver Mt Zion', 'active': True, 'country': 'Canada'}]
Worries are stirred by the name of the function. “format” is very vague. Upon closer inspection of the code, these worries begin to claw. Three things happen in the same loop. The 'country'
key gets set to 'Canada'
. Punctuation is removed from the band name. The band name gets capitalized. It is hard to tell what the code is intended to do and hard to tell if it does what it appears to do. The code is hard to reuse, hard to test and hard to parallelize.
Compare it with this:
print pipeline_each(bands, [set_canada_as_country,
strip_punctuation_from_name,
capitalize_names])
This code is easy to understand. It gives the impression that the auxiliary functions are functional because they seem to be chained together. The output from the previous one comprises the input to the next. If they are functional, they are easy to verify. They are also easy to reuse, easy to test and easy to parallelize.
The job of pipeline_each()
is to pass the bands, one at a time, to a transformation function, like set_canada_as_country()
. After the function has been applied to all the bands, pipeline_each()
bundles up the transformed bands. Then, it passes each one to the next function.
Let’s look at the transformation functions.
def assoc(_d, key, value):
from copy import deepcopy
d = deepcopy(_d)
d[key] = value
return d
def set_canada_as_country(band):
return assoc(band, 'country', "Canada")
def strip_punctuation_from_name(band):
return assoc(band, 'name', band['name'].replace('.', ''))
def capitalize_names(band):
return assoc(band, 'name', band['name'].title())
Each one associates a key on a band with a new value. There is no easy way to do this without mutating the original band. assoc()
solves this problem by using deepcopy()
to produce a copy of the passed dictionary. Each transformation function makes its modification to the copy and returns that copy.
Everything seems fine. Band dictionary originals are protected from mutation when a key is associated with a new value. But there are two other potential mutations in the code above. In strip_punctuation_from_name()
, the unpunctuated name is generated by calling replace()
on the original name. In capitalize_names()
, the capitalized name is generated by calling title()
on the original name. If replace()
and title()
are not functional, strip_punctuation_from_name()
and capitalize_names()
are not functional.
Fortunately, replace()
and title()
do not mutate the strings they operate on. This is because strings are immutable in Python. When, for example, replace()
operates on a band name string, the original band name is copied and replace()
is called on the copy. Phew.
This contrast between the mutability of strings and dictionaries in Python illustrates the appeal of languages like Clojure. The programmer need never think about whether they are mutating data. They aren’t.
Exercise 4. Try and write the pipeline_each
function. Think about the order of operations. The bands in the array are passed, one band at a time, to the first transformation function. The bands in the resulting array are passed, one band at a time, to the second transformation function. And so forth.
My solution:
def pipeline_each(data, fns):
return reduce(lambda a, x: map(x, a),
fns,
data)
All three transformation functions boil down to making a change to a particular field on the passed band. call()
can be used to abstract that. It takes a function to apply and the key of the value to apply it to.
set_canada_as_country = call(lambda x: 'Canada', 'country')
strip_punctuation_from_name = call(lambda x: x.replace('.', ''), 'name')
capitalize_names = call(str.title, 'name')
print pipeline_each(bands, [set_canada_as_country,
strip_punctuation_from_name,
capitalize_names])
Or, if we are willing to sacrifice readability for conciseness, just:
print pipeline_each(bands, [call(lambda x: 'Canada', 'country'),
call(lambda x: x.replace('.', ''), 'name'),
call(str.title, 'name')])
The code for call()
:
def assoc(_d, key, value):
from copy import deepcopy
d = deepcopy(_d)
d[key] = value
return d
def call(fn, key):
def apply_fn(record):
return assoc(record, key, fn(record.get(key)))
return apply_fn
There is a lot going on here. Let’s take it piece by piece.
One. call()
is a higher order function. A higher order function takes a function as an argument, or returns a function. Or, like call()
, it does both.
Two. apply_fn()
looks very similar to the three transformation functions. It takes a record (a band). It looks up the value at record[key]
. It calls fn
on that value. It assigns the result back to a copy of the record. It returns the copy.
Three. call()
does not do any actual work. apply_fn()
, when called, will do the work. In the example of using pipeline_each()
above, one instance of apply_fn()
will set 'country'
to 'Canada'
on a passed band. Another instance will capitalize the name of a passed band.
Four. When an apply_fn()
instance is run, fn
and key
will not be in scope. They are neither arguments to apply_fn()
, nor locals inside it. But they will still be accessible. When a function is defined, it saves references to the variables it closes over: those that were defined in a scope outside the function and that are used inside the function. When the function is run and its code references a variable, Python looks up the variable in the locals and in the arguments. If it doesn’t find it there, it looks in the saved references to closed over variables. This is where it will find fn
and key
.
Five. There is no mention of bands in the call()
code. That is because call()
could be used to generate pipeline functions for any program, regardless of topic. Functional programming is partly about building up a library of generic, reusable, composable functions.
Good job. Closures, higher order functions and variable scope all covered in the space of a few paragraphs. Have a nice glass of lemonade.
There is one more piece of band processing to do. That is to remove everything but the name and country. extract_name_and_country()
can pull that information out:
def extract_name_and_country(band):
plucked_band = {}
plucked_band['name'] = band['name']
plucked_band['country'] = band['country']
return plucked_band
print pipeline_each(bands, [call(lambda x: 'Canada', 'country'),
call(lambda x: x.replace('.', ''), 'name'),
call(str.title, 'name'),
extract_name_and_country])
# => [{'name': 'Sunset Rubdown', 'country': 'Canada'},
# {'name': 'Women', 'country': 'Canada'},
# {'name': 'A Silver Mt Zion', 'country': 'Canada'}]
extract_name_and_country()
could have been written as a generic function called pluck()
. pluck()
would be used like this:
print pipeline_each(bands, [call(lambda x: 'Canada', 'country'),
call(lambda x: x.replace('.', ''), 'name'),
call(str.title, 'name'),
pluck(['name', 'country'])])
Exercise 5. pluck()
takes a list of keys to extract from each record. Try and write it. It will need to be a higher order function.
My solution:
def pluck(keys):
def pluck_fn(record):
return reduce(lambda a, x: assoc(a, x, record[x]),
keys,
{})
return pluck_fn
What now?
Functional code co-exists very well with code written in other styles. The transformations in this article can be applied to any code base in any language. Try applying them to your own code.
Think of Mary, Isla and Sam. Turn iterations of lists into maps and reduces.
Think of the race. Break code into functions. Make those functions functional. Turn a loop that repeats a process into a recursion.
Think of the bands. Turn a sequence of operations into a pipeline.
1 An immutable piece of data is one that cannot be changed. Some languages, like Clojure, make all values immutable by default. Any “mutating” operations copy the value, change it and pass back the changed copy. This eliminates bugs that arise from a programmer’s incomplete model of the possible states their program may enter.
2 Languages that support first class functions allow functions to be treated like any other value. This means they can be created, passed to functions, returned from functions and stored inside data structures.
3 Tail call optimisation is a programming language feature. Each time a function recurses, a new stack frame is created. A stack frame is used to store the arguments and local values for the current function invocation. If a function recurses a large number of times, it is possible for the interpreter or compiler to run out of memory. Languages with tail call optimisation reuse the same stack frame for their entire sequence of recursive calls. Languages like Python that do not have tail call optimisation generally limit the number of times a function may recurse to some number in the thousands. In the case of the race()
function, there are only five time steps, so it is safe.
4 Currying means decomposing a function that takes multiple arguments into a function that takes the first argument and returns a function that takes the next argument, and so forth for all the arguments.
5 Parallelization means running the same code concurrently without synchronization. These concurrent processes are often run on multiple processors.
6 Lazy evaluation is a compiler technique that avoids running code until the result is needed.
7 A process is deterministic if repetitions yield the same result every time.
Little Lisp interpreter
Little Lisp is an interpreter that supports function invocation, lambdas, lets, ifs, numbers, strings, a few library functions, and lists. I wrote it for a lightning talk at the Recurse Center to show how easy it is to write an interpreter. The code is 116 lines of JavaScript. I will explain how it works.
First, let’s learn some Lisp.
An atom, the simplest Lisp form:
1
Another atom. This time, a string:
"a"
An empty list:
()
A list containing an atom:
(1)
A list containing two atoms:
(1 2)
A list containing an atom and another list:
(1 (2))
A function invocation. This comprises a list where the first element is the function and the rest of the elements are the arguments. first
takes one argument, (1 2)
, and returns 1
.
(first (1 2))
=> 1
A lambda. That is: a function definition. The lambda takes a parameter, x
, and just returns it.
(lambda (x)
x)
A lambda invocation. This comprises a list where the first element is a lambda and the rest of the elements are the arguments. The lambda takes one argument, "Lisp"
, and returns it.
((lambda (x)
x)
"Lisp")
=> "Lisp"
Writing a Lisp interpreter is really easy.
The code for Little Lisp has two parts: the parser and the interpreter.
The parser.
Parsing has two phases: tokenizing and parenthesizing.
tokenize()
takes a string of Lisp code, puts spaces around every parenthesis and splits on whitespace. For example, it takes something like ((lambda (x) x) "Lisp")
, transforms it into ` ( ( lambda ( x ) x ) “Lisp” ) ` and transforms that into ['(', '(', 'lambda', '(', 'x', ')', 'x', ')', '"Lisp"', ')']
.
var tokenize = function(input) {
return input.replace(/\(/g, ' ( ')
.replace(/\)/g, ' ) ')
.trim()
.split(/\s+/);
};
parenthesize()
takes the tokens produced by tokenize()
and produces a nested array that mimics the structure of the Lisp code. Each atom in the nested array is labelled as an identifier or a literal. For example, ['(', '(', 'lambda', '(', 'x', ')', 'x', ')', '"Lisp"', ')']
is transformed into:
[[{ type: 'identifier', value: 'lambda' }, [{ type: 'identifier', value: 'x' }],
{ type: 'identifier', value: 'x' }],
{ type: 'literal', value: 'Lisp' }]
parenthesize()
goes through the tokens, one by one. If the current token is an opening parenthesis, it starts building a new array. If the current token is an atom, it labels it with its type and appends it to the current array. If the current token is a closing parenthesis, it stops building the current array and continues building the enclosing array.
var parenthesize = function(input, list) {
if (list === undefined) {
return parenthesize(input, []);
} else {
var token = input.shift();
if (token === undefined) {
return list.pop();
} else if (token === "(") {
list.push(parenthesize(input, []));
return parenthesize(input, list);
} else if (token === ")") {
return list;
} else {
return parenthesize(input, list.concat(categorize(token)));
}
}
};
When parenthesize()
is first called, the input
parameter contains the array of tokens returned by tokenize()
. For example:
['(', '(', 'lambda', '(', 'x', ')', 'x', ')', '"Lisp"', ')']
When parenthesize()
is first called, the list
parameter is undefined. Lines 2-3 run and parenthesize()
recurses with list
set to an empty array.
In the recursion, line 5 runs and removes the first opening parenthesis from input
. Line 9 starts a new, empty list by recursing with a new, empty array.
In the recursion, line 5 runs and removes another opening parenthesis from input
. Line 9 starts another new, empty list by recursing with another new, empty array.
In the recursion, input
is ['lambda', '(', 'x', ')', 'x', ')', '"Lisp"', ')']
. Line 14 runs with token
set to lambda
. It calls categorize()
and passes lambda
as the input
argument. Line 7 of categorize()
runs and returns an object with type
set to identifier
and value
set to lambda
.
var categorize = function(input) {
if (!isNaN(parseFloat(input))) {
return { type:'literal', value: parseFloat(input) };
} else if (input[0] === '"' && input.slice(-1) === '"') {
return { type:'literal', value: input.slice(1, -1) };
} else {
return { type:'identifier', value: input };
}
};
Line 14 of parenthesize()
appends to list
the object returned by categorize()
and recurses with the rest of the input and list
.
var parenthesize = function(input, list) {
if (list === undefined) {
return parenthesize(input, []);
} else {
var token = input.shift();
if (token === undefined) {
return list.pop();
} else if (token === "(") {
list.push(parenthesize(input, []));
return parenthesize(input, list);
} else if (token === ")") {
return list;
} else {
return parenthesize(input, list.concat(categorize(token)));
}
}
};
In the recursion, the next token is a parenthesis. Line 9 of parenthesize()
starts a new, empty list by recursing with an empty array. In the recursion, input
is ['x', ')', 'x', ')', '"Lisp"', ')']
. Line 14 runs with token
set to x
. It makes a new object with a value of x
and a type of identifier
. It appends this object to list
and recurses.
In the recursion, the next token is a closing parenthesis. Line 12 runs and returns the completed list
: [{ type: 'identifier', value: 'x' }]
.
parenthesize()
continues recursing until it has processed all of the input tokens. It returns the nested array of typed atoms.
parse()
is the successive application of tokenize()
and parenthesize()
:
var parse = function(input) {
return parenthesize(tokenize(input));
};
Given a starting input of ((lambda (x) x) "Lisp")
, the final output of the parser is:
[[{ type: 'identifier', value: 'lambda' }, [{ type: 'identifier', value: 'x' }],
{ type: 'identifier', value: 'x' }],
{ type: 'literal', value: 'Lisp' }]
The interpreter.
After parsing is complete, interpreting begins.
interpret()
receives the output of parse()
and executes it. Given the output from the parsing example above, interpret()
would construct a lambda and invoke it with the argument "Lisp"
. The lambda invocation would return "Lisp"
, which would be the output of the whole program.
As well as the input to execute, interpret()
receives an execution context. This is the place where variables and their values are stored. When a piece of Lisp code is executed by interpret()
, the execution context contains the variables that are accessible to that code.
These variables are stored in a hierarchy. Variables in the current scope are at the bottom of the hierarchy. Variables in the enclosing scope are in the level above. Variables in the scope enclosing the enclosing scope are in the level above that. And so on. For example, in the following code:
((lambda (a)
((lambda (b)
(b a))
"b"))
"a")
On line 3, the execution context has two active scopes. The inner lambda forms the current scope. The outer lambda forms an enclosing scope. The current scope has b
bound to "b"
. The enclosing scope has a
bound to "a"
. When line 3 runs, the interpreter tries to look up b
in the context. It checks the current scope, finds b
and returns its value. Still on line 3, the interpreter tries to look up a
. It checks the current scope and does not find a
, so it tries the enclosing scope. There, it finds a
and returns its value.
In Little Lisp, the execution context is modeled with an object made by calling the Context
constructor. This takes scope
, an object that contains variables and their values in the current scope. And it takes parent
. If parent
is undefined
, the scope is the top, or global, or outermost one.
var Context = function(scope, parent) {
this.scope = scope;
this.parent = parent;
this.get = function(identifier) {
if (identifier in this.scope) {
return this.scope[identifier];
} else if (this.parent !== undefined) {
return this.parent.get(identifier);
}
};
};
We have seen how ((lambda (x) x) "Lisp")
gets parsed. Let us see how the parsed code gets executed.
var interpret = function(input, context) {
if (context === undefined) {
return interpret(input, new Context(library));
} else if (input instanceof Array) {
return interpretList(input, context);
} else if (input.type === "identifier") {
return context.get(input.value);
} else {
return input.value;
}
};
The first time interpret()
is called, context
is undefined
. Lines 2-3 are run to make an execution context.
When the initial context is instantiated, the constructor function takes the library
object. This contains the functions built in to the language: first
, rest
and print
. These functions are written in JavaScript.
interpret()
recurses with the original input
and the new context
.
input
contains the full example output from the parsing section:
[[{ type: 'identifier', value: 'lambda' }, [{ type: 'identifier', value: 'x' }],
{ type: 'identifier', value: 'x' }],
{ type: 'literal', value: 'Lisp' }]
Because input
is an array and context
is defined, lines 4-5 are run and interpretList()
is called.
var interpretList = function(input, context) {
if (input.length > 0 && input[0].value in special) {
return special[input[0].value](input, context);
} else {
var list = input.map(function(x) { return interpret(x, context); });
if (list[0] instanceof Function) {
return list[0].apply(undefined, list.slice(1));
} else {
return list;
}
}
};
In interpretList()
, line 5 maps over the input array and calls interpret()
on each element. When interpret()
is called on the lambda definition, interpretList()
gets called again. This time, the input
argument to interpretList()
is:
[{ type: 'identifier', value: 'lambda' }, [{ type: 'identifier', value: 'x' }],
{ type: 'identifier', value: 'x' }]
Line 3 of interpretList()
gets called, because lambda
, the first element in the array, is a special form. special.lambda()
is called to create the lambda function.
var special = {
lambda: function(input, context) {
return function() {
var lambdaArguments = arguments;
var lambdaScope = input[1].reduce(function(acc, x, i) {
acc[x.value] = lambdaArguments[i];
return acc;
}, {});
return interpret(input[2], new Context(lambdaScope, context));
};
}
};
special.lambda()
takes the part of the input that defines the lambda. It returns a function that, when invoked, invokes the lambda on some arguments.
Line 3 begins the definition of the lambda invocation function. Line 4 stores the arguments passed to the lambda invocation. Line 5 starts creating a new scope for the lambda’s invocation. It reduces over the part of the input that defines the parameters of the lambda: [{ type: 'identifier', value: 'x' }]
. It adds a key/value pair to the lambda scope for each lambda parameter in input
and argument passed to the lambda. Line 10 invokes the lambda by calling interpret()
on the lambda body: { type: 'identifier', value: 'x' }
. It passes in the lambda context that contains the lambda’s scope and the parent context.
The lambda is now represented by the function returned by special.lambda()
.
interpretList()
continues mapping over the input
array by calling interpret()
on the second element of the list: the "Lisp"
string.
var interpret = function(input, context) {
if (context === undefined) {
return interpret(input, new Context(library));
} else if (input instanceof Array) {
return interpretList(input, context);
} else if (input.type === "identifier") {
return context.get(input.value);
} else {
return input.value;
}
};
This runs line 9 of interpret()
which just returns the value
attribute of the literal object: 'Lisp'
. The map operation on line 5 of interpretList()
is complete. list
is:
[function(args) { /* code to invoke lambda */ },
'Lisp']
Line 6 of interpretList()
runs and finds that the first element of list
is a JavaScript function. This means that the list is an invocation. Line 7 runs and invokes the lambda, passing the rest of list
as arguments.
var interpretList = function(input, context) {
if (input.length > 0 && input[0].value in special) {
return special[input[0].value](input, context);
} else {
var list = input.map(function(x) { return interpret(x, context); });
if (list[0] instanceof Function) {
return list[0].apply(undefined, list.slice(1));
} else {
return list;
}
}
};
In the lambda invocation function, line 8 calls interpret()
on the lambda body, { type: 'identifier', value: 'x' }
.
function() {
var lambdaArguments = arguments;
var lambdaScope = input[1].reduce(function(acc, x, i) {
acc[x.value] = lambdaArguments[i];
return acc;
}, {});
return interpret(input[2], new Context(lambdaScope, context));
};
Line 6 of interpret()
finds that input
is an identifier atom. Line 7 looks up the identifier, x
, in context
and returns 'Lisp'
.
var interpret = function(input, context) {
if (context === undefined) {
return interpret(input, new Context(library));
} else if (input instanceof Array) {
return interpretList(input, context);
} else if (input.type === "identifier") {
return context.get(input.value);
} else {
return input.value;
}
};
'Lisp'
is returned by the lambda invocation function, which is returned by interpretList()
, which is returned by interpret()
, and that’s it.
Go to the GitHub repository to see all the code. And look at lis.py, the dazzlingly simple Scheme interpreter that Peter Norvig wrote in Python.
Coquette
I’ve just released Coquette, a micro framework for JavaScript games. It handles collision detection, the game update loop, keyboard input and canvas rendering. I wrote it in a panic when, two hours before the deadline of April’s Ludum Dare, I discovered that entries cannot use closed source library code.
A pop quiz purporting to be about scope in JavaScript that is actually a polemic about why modules are a good idea
This is the blog post version of a lightning talk I gave at the Recurse Center last week.
I will ask you some questions. For each question, there will be some JavaScript files loaded into an HTML page. In these JavaScript files, there will be some console.log()
invocations. Your job is to predict what happens after each invocation.
For this first question, there is one JavaScript file, a.js
, loaded into this HTML page:
<!doctype html>
<html>
<head>
<script src="a.js"></script>
</head>
</html>
One
// a.js var blah = "blah"; console.log(blah); // Option 1: ReferenceError: blah is not defined // Option 2: print 'blah' to the console
blah
will be printed. The variable, blah
, is in the same scope as the console.log()
invocation.
For this rest of the questions, there will be two JavaScript files, a.js
and b.js
, loaded into this HTML page:
<!doctype html>
<html>
<head>
<script src="a.js"></script>
<script src="b.js"></script>
</head>
</html>
Two
// a.js
blah = "blah";
// b.js
console.log(blah); // Option 1: ReferenceError: blah is not defined
// Option 2: print 'blah' to the console
blah
will be printed. It doesn’t matter that blah
was declared in a different file. All files in JavaScript are loaded into a shared scope.
Three
// a.js
var blah = "blah";
// b.js
console.log(blah); // Option 1: ReferenceError: blah is not defined
// Option 2: print 'blah' to the console
blah
will be printed. It doesn’t matter that the var
keyword was used to declare blah
. The declaration was in a shared scope, and that scope is global, so blah
becomes a global variable.
Four
// a.js
var blahFn = function() {
return "blah";
};
// b.js
console.log(blahFn()); // Option 1: ReferenceError: blah is not defined
// Option 2: print 'blah' to the console
blah
will be printed. All the logic shown so far applies to functions.
Five
// a.js
function blahFn() {
var blah = "blah";
};
// b.js
blahFn();
console.log(blah); // Option 1: ReferenceError: blah is not defined
// Option 2: print 'blah' to the console
There will be a reference error. The blah
variable is created inside a function, in the function’s own, private scope. It is not available in the global scope.
Six
// a.js
function blahFn() {
blah = "blah";
};
// b.js
blahFn();
console.log(blah); // Option 1: ReferenceError: blah is not defined
// Option 2: print 'blah' to the console
blah
will be printed. With the var keyword omitted, the blah
variable goes into the global scope.
Seven
// a.js
;(function() {
function blahFn() {
return "blah";
};
console.log(blahFn()); // Option 1: ReferenceError: blahFn is not defined
// Option 2: print 'blah' to the console
})();
a.js
is loaded. The anonymous function at the top is invoked, creating our first JavaScript module. The module instantiates blahFn
. The console.log()
invocation prints blah
because the code inside a module has a shared scope.
// b.js
console.log(blahFn()); // Option 1: ReferenceError: blahFn is not defined
// Option 2: print 'blah' to the console
When console.log()
is invoked in b.js
, a ReferenceError is thrown. The variables inside the module in a.js
are locked away. The global scope is not polluted.
Eight
// a.js
;(function(exports) {
var Blah = {};
Blah.blahFn = function() {
return "blah";
};
exports.Blah = Blah;
})(this);
// b.js
console.log(Blah.blahFn()); // Option 1: ReferenceError: Blah is not defined
// Option 2: print 'blah' to the console
When the console.log()
is invoked, blah
is printed. The module has made itself available as an object in the global scope. It has also made available one of its variables on this object.
To do this, it made an empty object, Blah
, and attached blahFn
to it as an attribute. The anonymous function enclosing the module was passed this
. Because the anonymous function was invoked in the global context, this
is the global object, also referenced with window
. The module attached Blah
to the exports/global/window
object, making it available to the console.log()
invocation.
Nine
// a.js
(function woo() {
return function() {
console.log("woo")
};
})
// b.js
(function() {
console.log("boo")
})();
What is printed? woo
or boo
?
It’s boo
. The code in a.js
makes a function called woo
and does nothing with it.
In questions seven and eight, both modules were preceded by a semicolon. Why is this?
Often, a module is put in its own JavaScript file. Often, JavaScript files that are part of a website are joined together. This is part of a process to reduce download time. If you join the a.js
and b.js
together and then execute that, woo
will be printed. The operation of the boo module was changed by the random code preceding it in the a.js
file. Because the random code in a.js
had no semicolon at the end, the parentheses around the boo module invoked the woo
function. This returned the anonymous function inside the woo
function which was then invoked by the parentheses sitting at the end of b.js
.
Modules protect themselves from these sorts of nightmares with a leading semicolon.
The stabbing in Shadow of the Colossus
You finally realise how to get off the ground and onto the legs of the colossus. Maybe you need to shoot arrows into the soles of its feet so it kneels in pain. Maybe you need to hide under an archway, wait for it to stoop down low and make a leap for its beard.
You enact your plan, haul yourself up and begin climbing the body of the colossus. As you go, you might occasionally find part-time horizontal surfaces - armour, shackles - upon which you can rest and regain your strength. Or, you might plan your route so you can shimmy up as fast as possible, avoiding the limbs that sometimes flail and force you to stop climbing and just cling on.
You reach the top. The shoulders of a biped. The back of a quadruped. This moment is magic. You stagger across this great roiling mass that feels like moving rock. You cling on as it rears up. You creep over it, whispering, “stay calm”, praying it will not buck.
Being at the top is relatively safe. Though you must cling on when the colossus bridles, there is always a reprieve when you can stand and regain your strength. Your aim is now to reach one of the glowing tattoos on the colossus’s body. To get to a tattoo, you must leave your perch and climb to the crown of a swaying head, or a flank that can only be reached from above. These forays are a commitment. Once you set out, you have a ration of strength that cannot be replenished until you return. If your strength runs out during your expedition, you will lose your grip, fall to the ground and possibly die.
You reach a tattoo. To kill the colossus, you must repeatedly stab it here. You raise your sword. The longer you hold it aloft, the more powerful the stab will be. The more powerful the stab is, the fewer stabs you will need.
This piece of the game is particularly beautifully designed. If the colossus bucks while your sword is raised, you must lower your sword and cling on. This means you have wasted grip strength. Having your sword raised is a tense balancing act where you want to wait as long as possible, but no longer, before you strike. There is an elegant mapping between the sword and your controller. You press and release X to raise your sword. You press X again to stab. A more intuitive system would have you press and hold X to keep the sword raised. But the way the controls actually work is better. Stabbing the tattoo is such a cathartic, primal screaming moment, to attach it to the release of a button would diminish its power. The way it is, you jam your finger back down on X like you really are astride the colossus’s neck, like it is about to throw you, like it is the only thing stopping you from winning back your girlfriend.