Saturday, April 5, 2008

Runtime type-checking of recursive data types with lazy recursive functions

In Concepts, Techniques, and Models of Computer Programming van Roy and Haridi introduce a notation to describe recursive data types. For example, this is a possible description of binary trees (simplified from Section 4.3.1: Type Notation):

<BTree T> ::= tree(value: T
left: <BTree T>
right: <BTree T>)
| leaf(value: T)


This notation is close to the record syntax of Oz, which is no coincidence, given that the book uses Oz as its principle vehicle. Records in Oz are a very powerful and flexible data type. Even lists are just nested records in Oz, albeit supported by a special syntax. So records are a natural fit as an implementation technique for recursive data types.

Can we exploit this notation to check the well-formedness of trees and other recursive data types at runtime?1

Yes, we can! The trick is to use lazy functions to describe such data types.2

So we define types as functions with type variables as their arguments. In order to produce valid Oz code, the notation must be adjusted slightly. Instead of A|B, we write one_of(A B). Also, the angle brackets disappear. Where we refer back to the defined type itself, we need to use function call syntax, of course:


fun lazy {BTree T }
one_of(tree(value: T
left: {BTree T}
right: {BTree T}
)
leaf(value: T)
)
end


 
The result of {BTree int}, for example, is an "infinite" record that describes the set of binary trees with integer elements. {BTree {BTree atom}} describes binary trees of binary trees of atoms.
At this point, it becomes clear why BTree must be annotated as lazy: Otherwise the function would never return.

It is now pretty easy to implement a function that actually checks the type of concrete instances:


fun {Check Val Type}
case Type
of one_of(...) then
{Record.some Type fun {$ T} {Check Val T} end}
else
if {Atom.is Type} then
{Value.type Val} == Type
elseif {Record.is Type} then
{CheckRecordType Val Type}
end
end
end

fun {CheckRecordType Val Type}
{Record.is Val} andthen
{Record.label Val} == {Record.label Type} andthen
{Record.allInd Type
fun {$ Feat Field}
{HasFeature Val Feat} andthen {Check Val.Feat Field}
end
}
andthen % no additional fields in Val:
{All {Record.arity Val}
fun {$ Feat}
{HasFeature Type Feat}
end
}
end


 
The details of these two mutually recursive functions are not really interesting. They just walk through Val and Type in parallel and recursively check that every field in Val has the type that is specified in the corresponding field of Type.

This type of runtime type checking is quite costly. For a tree with N nodes, Check has a complexity of O(N). This might be okay in unit tests, but is probably too expensive for production code. Another drawback is that it does not work with cyclic data structures.

With a little helper procedure, we can assert the type of a value concisely:


proc {AssertType Val Type}
thread
if {Check Val Type} == false then
raise typeError(Val Type) end
end
end
end


 
In this way, it is even possible to assert the type of unbound variables.
Check is called concurrently, so it will not block the execution in this case. The type check of Val will be finished as soon as Val has been computed completely, thanks to dataflow programming and cheap threads in Oz.

Ideally, we'd like to print an informative error message which refers back to the source file and line number when a type check failed. This is currently only possible by using the deprecated macro facility of Oz.



1 Oz is a dynamically typed programming language. You cannot define recursive data types at compile time and neither the compiler nor the runtime will check the well-formedness of trees or other custom data types automatically.
2 For Haskellers, this will hardly come as a surprise. Constructors in Haskell are just a special kind of function that can be used in pattern matching.

Wednesday, April 2, 2008

Debuggers are Needlessly Tedious to Use

Are debuggers useful? Giles Bowkett certainly doesn't think so. According to him, "debuggers encourage a backwards workflow". They seduce you into working in a trial-and-error way instead of producing clear, well-tested code. They make you step through your code again and again to check whether you finally got it right.

However, in my opinion, there are circumstances under which debuggers are useful:
  • when working your way through legacy code,
  • when encountering a strange bug that is completely beyond your understanding,
  • when experimenting with a new technology.

Nevertheless, many experienced developers find debuggers pretty useless. One reason for that is that debuggers are typically tedious to use. Stepping manually through code takes a lot of time. Especially if you are hunting for a bug and don't know where exactly the bug actually happens.

At every function call you face a choice: to step into the function or not to step into the function. If you optimistically don't step into it and find that its result is not what you expected, you have to do it all over again: setting a new breakpoint, restarting the program (or test suite)...Tedious!


A Proposal

Why do debuggers work this way? Wouldn't it be much more useful if debuggers could just show what goes on inside a function, without requiring you to step through the code again and again?

To explore the idea, I implemented a prototypical "source code annotating debugger"1 for a subset of the programming language Oz. I'm using Oz as a functional programming language here, but it does in fact support multiple styles of programming in a harmonious way.


Example

Let's look at a (somewhat contrived) example which contains a well-hidden bug:


declare

fun {RemoveEvenNumbers Xs}
case Xs of X|Xr then
if {IsOdd X} then
X|{RemoveEvenNumbers Xr}
else
{RemoveEvenNumbers Xr}
end
[] nil then nil
end
end

fun {Sum Xs}
{FoldL Xs Number.'+' 0}
end

fun {SumOfOddNumbers Xs}
{Sum
{RemoveEvenNumbers Xs}
}
end

proc {Main}
{System.show {SumOfOddNumbers [~3 ~2 ~1 0 1]}}
end

in
{Main}


 

(The example has to be executed in the "OPI", i.e. from within Emacs in "Oz mode". Don't forget to save the buffer first because the debugger needs a buffer filename to work correctly.
The Oz syntax may look unfamiliar first, but it is in fact not much different from other functional languages.
fun {Name FormalArg1 FormalArg2) ... end defines a function "Name" with two arguments.
{Name ActualArg1 ActualArg2} calls the defined function.
The case statement is a pattern matching construct and X|Xr constructs a list with the head X and the tail Xr.)

The example program is intended to calculate and display the sum of a list of numbers, only taking into account the odd numbers. When running the code (with Mozart version 1.3.2), the result is: 1. Huh? It should of course be -3 + -1 + 1 = -3.

With the cursor on the call to SumOfOddNumbers, the debugger is started with a keyboard shortcut (M-Shift-t). Its output is the following annotated code, shown in a new Emacs buffer:


fun {SumOfOddNumbers Xs}
{Sum
{RemoveEvenNumbers Xs} %%> Xs:[~3 ~2 ~1 0 1] {RemoveEvenNumber..}:[1]
} %%> {Sum {Remov..}:1
end


 

The problem seems to be the function RemoveEvenNumbers. Its result is the single-element list [1], which is certainly not expected.
We move the cursor to the call to RemoveEvenNumbers and press M-Shift-t again. The debugger will annotate the source code of RemoveEvenNumbers as if it was executed with the current parameters bindings:


fun {RemoveEvenNumbers Xs}
case Xs of X|Xr then %%> Xs:[~3 ~2 ~1 0 1] <<X|Xr>>:success
if {IsOdd X} then %%> X:~3 {IsOdd X}:false
X|{RemoveEvenNumbers Xr}
else
{RemoveEvenNumbers Xr}%%> Xr:[~2 ~1 0 1] {RemoveEvenNumber..}:[1]
end %%> <<if {IsOdd X} t..>>:[1]
[] nil then nil
end %%> <<case Xs of X|X..>>:[1]
end



 

Curiously, the system function IsOdd returns false for the input value ~3 (Oz notation for "-3"). This is unexpected! It looks like we found an error in the system libraries.

Let's check the source code of IsOdd (again with M-Shift-t):


fun {IsOdd X} X mod 2 == 1 end %%> X:~3 << X mod 2>>:~1 <<X mod 2..>>:false 



 

Finally, we found the real bug. IsOdd only works correctly for non-negative numbers because X mod 2 is negative for odd negative values of X. The underlying problem is that in Oz the modulo operation returns a negative result for negative dividends (as in many other languages), while in math, the result is always positive.

(In reality, I did not find this bug, but another Oz user did and the bug is already fixed in the subversion repository.)

Conclusion

With the standard Oz debugger Ozcar (which is very usable), it takes about 12 key strokes and some mouse work to find out the reason for the bug in this small example. Using the alternative approach, it takes two key strokes plus some navigating in Emacs.
I believe this is a significant improvement. The biggest advantage is that you can easily inspect the values of all variables and the result of function calls at every place within the debugged function - just by looking at the annotated source code.


Limitations

The current implementation has some serious limitations. Some of these can be overcome by more sophisticated implementations, with some I'm not so sure:
  • The display space for annotations is limited. Large data cannot be shown completely. This can be resolved by an implementation which allows to explore the annotations interactively.

  • The debugger shows inconsistent results if the 'world' changes between executions, e.g. if global state is involved or external files are read and written. To solve this, one needs a debugger which remembers the state of all variables, along the complete callgraph, such that multiple executions are not necessary. For large programs, this might be impractical.

  • One of the features of Oz which are not supported, are for loops. They are in fact difficult to support because it would be necessary to annotate the code with the data of every single run of the loop.
    Recursive function like FoldL are less problematic. You can simply press M-Shift-t again and again if you want to explore the nested calls to such a function.
    The proposed approach to debugging therefore seems to be more useful for (mostly) functional programming language like Haskell or Oz than for imperative languages like C++, which very much depend on the use of loops.





1The prototypical implementation of the source-code annotating debugger for Oz can be downloaded here.