The Working R Programmer

Tips and tricks for serious software development in R

Update Is Not Update

I ran into this little problem yesterday and thought I would share it. I needed to collect a bunch of items into a list, and I needed to do that from several functions that functioned as callbacks.

Ok, that sounds abstract, so let me make it simple. I had a list

my_items <- list()

and one or more functions that should update it

f1 <- function(elm) {
  # add elm to my_item
}
f2 <- function(elm) {
  # add elm to my_item
}
…
fn <- function(elm) {
  # add elm to my_item
}

It was slightly more complicated because I didn’t want to use a global variable for my_items and so I wrapped all the functions in a (closure) scope, but there is no need for the added complexity here.

First, I did what you should never do and concatenated the list with the new element.

f1 <- function(elm) {
    my_items <<- c(my_items, elm)
}

I know that it is inefficient but it is easy to program, and if the list never gets long the performance is fine.1

The performance wasn’t fine.

So I had to change append code.

Before we get to that, however, I want to draw your attention to the assignment here. The assignment operator is <<-. If it was <- I would create a local reference to the list c(my_items, elm) and I would not update the global variable.

Let that be a hint for what happens next…

If you want to append to a list then

 <-  elm

is the way to do it. You get amortised constant time appends this way (instead of linear time appends with c()).

So, this looks reasonable, right?

f1 <- function(elm) {
    my_list[[length(my_list) + 1]] <- elm
}

It does look reasonable, but the problem is the <- assignment. It is less obvious here because we are updating the global variable and not assigning to a local variable, except that is exactly what we are doing.

When you update data in R, as a general rule, what you are actually doing is making a copy of the data with the modifications and then assigning the result back to the reference you have to the data.

The assignment here calls the [[<- function. When you assign to a function call (and [[ is a function call) a <- version of the function is called to produce updated values. The updated version is written to the variable you used. So what happens in my assignment from above is this. I get the global lest, then I modify it—which always means that I produce a new modified copy—and then I assign the result back to my_list. This, however, will be a new local variable. Even though I am modifying a list from a global variable, the way R handles indexing (and all <- functions) gives me a local variable.

The right solution, of course, is this:

f1 <- function(elm) {
    my_list[[length(my_list) + 1]] <<- elm
}

  1. Also, I didn’t think too hard about it. The efficient way to append isn’t complicated either, it is just not as readily available in my brain. [return]