cs-4400-sp26

Lecture 2: IfLang, LetLang, Scope and Substitution

What goes into a language?

So far, our CalcLang “language” doesn’t have most of what we think of as necessary to be a programming language. What are some common features it’s missing? (Discussion)

Possible answers:

Commands
Variables
Loops
Conditionals
Functions
(other data, like strings, booleans, lists, arrays, etc.)
Objects
Classes/interfaces

Today, we will focus on two of these:

Conditionals, with an if-then-else construct
Variables, locally scoped, with a “let” construct

The kind of programs one would be able to write, if we gave this language a reasonable concrete syntax, include for example:

let
  x = 4
  y = x + 2
in
  if (y > 3) then
	(y * 2)
  else
    let
      z = 10
    in
	  (x * y + z)

IfLang

First, we will introduce an if construct. To have something to put as the guard of an if, we also need constants representing true and false. Here is the syntax of our language:

(define-type IfLang
  (numI [value : Number])
  (boolI [value : Boolean])
  (addI [e1 : IfLang]
        [e2 : IfLang])
  (ifI [guard : IfLang]
       [thn : IfLang]
       [els : IfLang]))

The answers, or final results of the program, are sometimes called the values of the language. They can be thought of as a sub-syntax of all expressions, just those that have no more computation to be done. In this language, values are literal numbers and booleans.

(define-type Value
  (numV [v : Number])
  (boolV [v : Boolean]))

Together, we will write (and test) the following interpreter for the language.

(interp-ite : (IfLang -> Value))
(define (interp-ite e)
  (type-case IfLang e
    [(numI n) (numV n)]
    [(boolI b) (boolV b)]
    [(addI e1 e2)
     (let [(v1 (interp-ite e1))
           (v2 (interp-ite e2))]
       (if (and (numV? v1) (numV? v2))
           (numV (+ (numV-v v1) (numV-v v2)))
           (error 'runtime "can't add non-numbers"))
      )
     ] ;; end addI case
    [(ifI guard thn els)
     (let [(guardv (interp-ite guard))]
       (if (boolV? guardv)
           (if (boolV-v guardv) (interp-ite thn) (interp-ite els))
           (error 'runtime "can't have non-boolean guard"))
       ) ;; end let
     ] ;; end if case
    ;; [else (error 'unimp "")]
  )
  )

LetLang

Implementing the scoped variables present in let-expressions requires understanding some new concepts. We will return to and extend our notion of Abstract Syntax Tree (AST), generalizing to something called an Abstract Binding Tree (ABT).

For example, consider the expression

let x = 10 in x + x

We could represent this expression as the following AST:

Let
	Symbol: x
	Assn:
		Num 10
	Body:
		Plus
			Var x
			Var x

But what about the following expression?

let x = 10 in
	(let y = 10 in y)
	+
	y

You all know from your experience writing code that this isn’t a valid program: y is out of scope. But this is a perfectly good syntax tree:

Let
	Symbol: x
	Assn:
		Num 10
	Body:
		Plus
			Let
				Symbol: y
				Assn:
					Num 10
				Body:
					Var y
			Var y

TPS: can we describe a characteristic of these trees that makes them well-scoped?

Answer: Here is the criteria for a well-scoped let-expression: every symbol used in a Var is a descendant of where that symbol is introduced in a Let.

We can represent this criterion visually, with arrows that point “backwards” in the tree from the use of a variable to its binding site.

In fact, this is exactly the notation that Dr. Racket will give us when we hover over a variable.

Note that if we do this, we don’t actually need names anymore!

Here’s how to represent let x = 10 in (let y = 10 in y) + x as an ABT:

Let* <-------------------------------
	Assn:                           |
		Num 10                      |
	Body:                           |
		Plus                        |
			Let*  <-------------    |
				Assn:           |   |
					Num 10      |   |
				Body:           |   |
					Var  --------   |
			Var --------------------|

Our implementation won’t, for now, check this property of syntax. However, the interpreter will generate a run-time error if it encounters an unbound variable.

Runing LetLang expressions

When we run the following program, what answer do we expect?

let x = 10 in x + x

Again, just from your experience programming, the answer seems obvious: 20. But now that you’ve seen how interpreters work, can you come up with a mechanistic explanation for that behavior?

The PL word for what we need is substitution: we substitute the value 10 for the variable x in the body of the expression, x + x. Substitution is like find-and-replace: it crawls the structure of the term and replaces every occurrence of the variable with the expression we’re substituting. In a moment, we’ll think about how to implement this.

But first, there’s another question we need to resolve. How should this expression run? [TPS]

let x = 10 + 20 in x + 1

There are two seemingly-reasonable answers:

First, evaluate 10 + 20. Then, substitute the result for x in x + 1 and continue evaluating.
Substitute the expression 10 + 20 for x in x + 1, then continue evaluating.

[TPS] Why might one of these be favorable compared to the other? Can you think of situations where one seems better, and situations where the other seems better?

This difference is called eager versus lazy evaluation of let-bindings, or sometimes call by value (CBV) versus call by name (CBN). This is the first real design decision we have seen, and it’s a serious one that PL designers and researchers argue and write papers about.

In this class, we will default to using a CBN semantics, for reasons that are a bit subtle to get into at this time, but that we will come back to.

Under a CBN semantics, variables can be thought of as standing for values. That is, an un-evaluated expression will never get dropped in to replace a variable. This simplifies to some extent our notion of substitution, because we can think of it as only being defined on values for the “data being substituted”.

One more thing: shadowing

What about this expression? What should it run to?

let x = 10 in let x = 20 in x

In most languages, this would run to 20. That’s because we assume that a name is defined twice, any reference to it refers to the “innermost”, or most recent, occurrence. The ABT here would look like

            let*
            /    \
            10   let* <---\
                    / \    |
                    20 *---/

So, this is something to keep in mind when we run substitution: we shouldn’t substitute underneath let-bindings that bind the same name as the variable we’re substituting for.

Ok, we have the concepts we need now to implement LetLang in Plait.

Plait Implementation of Let Lang

To make our lives a little easier, we’ll take out our new IfLang constructs and just go back to all answers being numbers. We’ll write the following code together.

(define-type LetExpr
  [numE (n : Number)]
  [addE (l : LetExpr) (r : LetExpr)]
  [varE (s : Symbol)]
  [letE (var : Symbol)
        (assignment : LetExpr)
        (body : LetExpr)]
  )

(subst : (Symbol Number LetExpr -> LetExpr))
(define (subst id assignment body)
  (type-case LetExpr body
    [(numE n) (numE n)]
    [(addE e1 e2)
     (addE (subst id assignment e1)
           (subst id assignment e2))]
    [(varE x)
     (if (equal? x id) (numE assignment) (varE x))]
    [(letE x assn body)
     (if (equal? id x)
         (letE x (subst id assignment assn) body) ;; shadowed id
         (letE x (subst id assignment assn)
               (subst id assignment body)))]
  ;;  [else (error 'unimp "")]
  )
)

(interp-let : (LetExpr -> Number))
(define (interp-let e)
  (type-case LetExpr e
    [(numE n) n]
    [(addE e1 e2)
     (+ (interp-let e1)
        (interp-let e2))]
    [(varE id)
     (error 'runtime "unbound variable")]
    [(letE var assignment body)
     (interp-let (subst var (interp-let assignment) body))       
     ]
    )
  )