Saturday, April 5, 2008

Runtime type-checking of recursive data types with lazy recursive functions

In Concepts, Techniques, and Models of Computer Programming van Roy and Haridi introduce a notation to describe recursive data types. For example, this is a possible description of binary trees (simplified from Section 4.3.1: Type Notation):

<BTree T> ::= tree(value: T
left: <BTree T>
right: <BTree T>)
| leaf(value: T)


This notation is close to the record syntax of Oz, which is no coincidence, given that the book uses Oz as its principle vehicle. Records in Oz are a very powerful and flexible data type. Even lists are just nested records in Oz, albeit supported by a special syntax. So records are a natural fit as an implementation technique for recursive data types.

Can we exploit this notation to check the well-formedness of trees and other recursive data types at runtime?1

Yes, we can! The trick is to use lazy functions to describe such data types.2

So we define types as functions with type variables as their arguments. In order to produce valid Oz code, the notation must be adjusted slightly. Instead of A|B, we write one_of(A B). Also, the angle brackets disappear. Where we refer back to the defined type itself, we need to use function call syntax, of course:


fun lazy {BTree T }
one_of(tree(value: T
left: {BTree T}
right: {BTree T}
)
leaf(value: T)
)
end


 
The result of {BTree int}, for example, is an "infinite" record that describes the set of binary trees with integer elements. {BTree {BTree atom}} describes binary trees of binary trees of atoms.
At this point, it becomes clear why BTree must be annotated as lazy: Otherwise the function would never return.

It is now pretty easy to implement a function that actually checks the type of concrete instances:


fun {Check Val Type}
case Type
of one_of(...) then
{Record.some Type fun {$ T} {Check Val T} end}
else
if {Atom.is Type} then
{Value.type Val} == Type
elseif {Record.is Type} then
{CheckRecordType Val Type}
end
end
end

fun {CheckRecordType Val Type}
{Record.is Val} andthen
{Record.label Val} == {Record.label Type} andthen
{Record.allInd Type
fun {$ Feat Field}
{HasFeature Val Feat} andthen {Check Val.Feat Field}
end
}
andthen % no additional fields in Val:
{All {Record.arity Val}
fun {$ Feat}
{HasFeature Type Feat}
end
}
end


 
The details of these two mutually recursive functions are not really interesting. They just walk through Val and Type in parallel and recursively check that every field in Val has the type that is specified in the corresponding field of Type.

This type of runtime type checking is quite costly. For a tree with N nodes, Check has a complexity of O(N). This might be okay in unit tests, but is probably too expensive for production code. Another drawback is that it does not work with cyclic data structures.

With a little helper procedure, we can assert the type of a value concisely:


proc {AssertType Val Type}
thread
if {Check Val Type} == false then
raise typeError(Val Type) end
end
end
end


 
In this way, it is even possible to assert the type of unbound variables.
Check is called concurrently, so it will not block the execution in this case. The type check of Val will be finished as soon as Val has been computed completely, thanks to dataflow programming and cheap threads in Oz.

Ideally, we'd like to print an informative error message which refers back to the source file and line number when a type check failed. This is currently only possible by using the deprecated macro facility of Oz.



1 Oz is a dynamically typed programming language. You cannot define recursive data types at compile time and neither the compiler nor the runtime will check the well-formedness of trees or other custom data types automatically.
2 For Haskellers, this will hardly come as a surprise. Constructors in Haskell are just a special kind of function that can be used in pattern matching.

No comments: