Building a package that uses pattern matching
After a week spend programming string algorithms in C—for teaching purposes, I am not working on a new read-mapper—it is nice to get back to programming in R. I made a new release of tailr
today, so that is good, but what I really wanted to work on was matchbox
.
Where I left it last weekend, I had implemented various data structures—lists, stacks, queues, and search-trees—although search trees only with insertion yet. I wanted to work on the search trees today, first to implement plotting code to visualise them (for debugging purposes) and then to implement deletion. First, however, I wanted to make sure the package was in a good state.
It isn’t. It is actually the only package I have right now that doesn’t make it through Travis-CI.
There’s a couple of reasons that it fails, and it isn’t just failing on Travis. It is failing because the byte-complier croaks on some of the DSL constructions and the CMD CHECK on others.
Problems with byte-compilation
The first problem is with constructions where it, at first glance, looks like you are assigning to literals.
If we define a linked-list type like this
library(pmatch)
llist := NIL | CONS(car, cdr:llist)
then we an write a function that checks if a list is empty like this:
is_llist_empty <- function(llist) {
pmatch::cases(
llist,
NIL -> TRUE,
otherwise -> FALSE
)
}
The ->
assignments are part of the pmatch
DSL and not really assignments. Which is good, because it is incorrect to assign to TRUE
and FALSE
, just as it would be an error to assign to 1
or "foo"
.
The function works fine, though:
empty <- NIL
non_empty <- CONS(1, NIL)
is_llist_empty(empty)
## [1] TRUE
is_llist_empty(non_empty)
## [1] FALSE
Incidentally, if you thought it would be simpler to just compare llist == NIL
, you would be right. That would be simpler. But it doesn’t work with pmatch
because NIL
is actually just NA
with a class
attribute, and any comparison with NA
gives you NA
and not a boolean. So we do need pattern matching of some kind.
So what’s the problem? Clearly the function works!
Well, yes, but only because I do not try to byte-compile it.
compiler::cmpfun(is_llist_empty)
## Error: bad assignment: 'TRUE <- NIL'
If you put a function in a package, it will be byte-compiled by default. You can turn this off in your DESCRIPTION
file, but generally you want the optimisation you get from byte-compilation, so you do not want to turn it off.
You can cheat the byte-compiler using variables
is_llist_empty <- function(llist) {
t <- TRUE
f <- FALSE # to satisfy lintr
pmatch::cases(
llist,
NIL -> t,
otherwise -> f
)
}
compiler::cmpfun(is_llist_empty)
## function(llist) {
## t <- TRUE
## f <- FALSE # to satisfy lintr
## pmatch::cases(
## llist,
## NIL -> t,
## otherwise -> f
## )
## }
## <bytecode: 0x7f91699fa540>
But consider this function:
llcontains <- function(llist, elm) {
pmatch::cases(
llist,
NIL -> FALSE,
CONS(car, cdr) ->
if (car == elm) TRUE else llcontains(cdr, elm)
)
}
llcontains <- tailr::loop_transform(llcontains)
I am not byte-compiling it directly but I am inside tailr::loop_transform
, but more importantly, in this function I can “assign” NIL -> FALSE
and I do not get any errors when I build the package.
Similarly, I do not get any problems with this function
lldrop <- function(llist, k, acc = NIL) {
if (k == 0) return(llist)
pmatch::cases(
llist,
NIL -> stop("There were less than k elements in the list"),
CONS(car, cdr) -> lldrop(cdr, k - 1)
)
}
lldrop <- tailr::loop_transform(lldrop)
even though it is just as much a problem to assign to a call to stop
as it is to assign to TRUE
.
The reason that I do not have problems with these two, as you have no doubt guessed, is that I run them through tailr::loop_transform
. This function modifies the input function and replaces the pmatch::cases
DSL with a series of if
-else
-statements. The byte-compiler doesn’t see the DSL so it doesn’t complain about the odd “assignments”.
After the tailr
transformation, the functions actually look like this:
llcontains
## function (llist, elm)
## {
## .tailr_llist <- llist
## .tailr_elm <- elm
## callCC(function(escape) {
## repeat {
## llist <- .tailr_llist
## elm <- .tailr_elm
## if (!rlang::is_null(..match_env <- pmatch::test_pattern(llist,
## NIL)))
## with(..match_env, escape(FALSE))
## else if (!rlang::is_null(..match_env <- pmatch::test_pattern(llist,
## CONS(car, cdr))))
## with(..match_env, if (car == elm)
## escape(TRUE)
## else {
## .tailr_llist <<- cdr
## .tailr_elm <<- elm
## })
## }
## })
## }
## <bytecode: 0x7f9169afe5c0>
lldrop
## function (llist, k, acc = NIL)
## {
## .tailr_llist <- llist
## .tailr_k <- k
## .tailr_acc <- acc
## callCC(function(escape) {
## repeat {
## llist <- .tailr_llist
## k <- .tailr_k
## acc <- .tailr_acc
## {
## if (k == 0)
## escape(llist)
## if (!rlang::is_null(..match_env <- pmatch::test_pattern(llist,
## NIL)))
## with(..match_env, escape(stop("There were less than k elements in the list")))
## else if (!rlang::is_null(..match_env <- pmatch::test_pattern(llist,
## CONS(car, cdr))))
## with(..match_env, {
## .tailr_llist <<- cdr
## .tailr_k <<- k - 1
## })
## }
## }
## })
## }
## <bytecode: 0x7f9169917db8>
Now, I don’t want to run everything through the tail-recursion optimisation, of course. Not all functions can be transformed, and those that can but do not need to, will get slower.
Perhaps, though, I will need to write a similar transformation function in the pmatch
package that rewrites calls to cases
. I think that would be the solution to this issue.
Problems with CMD CHECK
So, I think I can solve the byte-compiler issue, but I have more of a problem with package checking.
There are some issues that are easy to solve. A package check will complain if there are unbound variables, which there are when we pattern match in pmatch::cases
. I can solve that by defining the variables and giving them dummy values—those variables will never be used, but it will shut up the checker.
What I don’t know how to handle is this error:
Error in attr(e, "srcref")[[i]] : subscript out of bounds
Calls: <Anonymous> ... <Anonymous> -> collectUsage -> collectUsageFun -> walkCode -> h
I get it for the functions I transform using tailr
. I figured it was because tailr::loop_transform
doesn’t set the "srcref"
attribute, so I tried changing that. I did something like this, just inside the loop_transform
function:
lldrop <- function(llist, k, acc = NIL) {
if (k == 0) return(llist)
pmatch::cases(
llist,
NIL -> stop("There were less than k elements in the list"),
CONS(car, cdr) -> lldrop(cdr, k - 1)
)
}
lldrop_tr <- tailr::loop_transform(lldrop)
attributes(lldrop_tr)
## NULL
lldrop_tr
## function (llist, k, acc = NIL)
## {
## .tailr_llist <- llist
## .tailr_k <- k
## .tailr_acc <- acc
## callCC(function(escape) {
## repeat {
## llist <- .tailr_llist
## k <- .tailr_k
## acc <- .tailr_acc
## {
## if (k == 0)
## escape(llist)
## if (!rlang::is_null(..match_env <- pmatch::test_pattern(llist,
## NIL)))
## with(..match_env, escape(stop("There were less than k elements in the list")))
## else if (!rlang::is_null(..match_env <- pmatch::test_pattern(llist,
## CONS(car, cdr))))
## with(..match_env, {
## .tailr_llist <<- cdr
## .tailr_k <<- k - 1
## })
## }
## }
## })
## }
## <bytecode: 0x7f91685cd1b8>
attr(lldrop_tr, "srcref") <- attr(lldrop, "srcref")
attributes(lldrop_tr)
## $srcref
## function(llist, k, acc = NIL) {
## if (k == 0) return(llist)
## pmatch::cases(
## llist,
## NIL -> stop("There were less than k elements in the list"),
## CONS(car, cdr) -> lldrop(cdr, k - 1)
## )
## }
lldrop_tr
## function(llist, k, acc = NIL) {
## if (k == 0) return(llist)
## pmatch::cases(
## llist,
## NIL -> stop("There were less than k elements in the list"),
## CONS(car, cdr) -> lldrop(cdr, k - 1)
## )
## }
## <bytecode: 0x7f91685cd1b8>
It is a bit of a cheat since I now claim that the modified function actually consists of the same code as the original, but I think that would be the way to go. If people want to see the source code of a function, they want the human readable version, not the transformed version.
It doesn’t help, though. I still get the error.
I don’t know what is causing this or how to debug it. I have never delved deep into package checking. I would love some pointers!