Question: Why does classing a list slow down the lengths() function?

Question

Why does classing a list slow down the lengths() function?

Answers 1
Added at 2016-12-27 19:12
Tags
Question

I've just noticed that classing a list, by adding an additional label to the class attribute (S3) or defining a new parent class (S4) drastically slows down the basic lengths() operation.

This suggests I should always unclass a "classed list" before calling lengths().

Can anyone

  1. explain why this happens, and/or

  2. suggest a better solution (or explain why this does not really matter since the differences are just microseconds in absolute terms).

Reproducible code:

# create a list of 1,000 elements with variable letter lengths
mylist <- list()
length(mylist) <- 1000
set.seed(99)
mylist <- lapply(mylist, function(x) sample(LETTERS, size = sample(1:100, size = 1), 
                                            replace = TRUE))

# create an S3 "classed" version
mylist_S3classed <- mylist
class(mylist_S3classed) <- c("myclass", "list")

# create an S4 classed version
setClass("mylist_S4class", contains = "list")
mylist_S4classed <- new("mylist_S4class", mylist)

# compare timings of lengths
microbenchmark::microbenchmark(lengths(mylist),
                               lengths(mylist_S3classed), 
                               lengths(mylist_S4classed),
                               unit = "relative")
## Unit: relative
##                      expr      min       lq     mean    median        uq       max neval
##           lengths(mylist)   1.0000   1.0000   1.0000   1.00000   1.00000   1.00000   100
## lengths(mylist_S3classed) 125.1433 119.3588 103.9747  91.90734  89.56034 291.97767   100
## lengths(mylist_S4classed) 162.4045 155.4870 119.0611 120.20908 111.95417  67.55309   100

## in absolute timings
microbenchmark::microbenchmark(lengths(mylist),
                               lengths(mylist_S3classed), 
                               lengths(mylist_S4classed))
## Unit: microseconds
##                      expr      min        lq       mean    median       uq      max neval
##           lengths(mylist)    6.401    6.9475    9.66612    9.4620   10.577   29.237   100
## lengths(mylist_S3classed)  792.738  851.0895  911.97067  898.0955  939.558 1604.189   100
## lengths(mylist_S4classed) 1050.448 1104.7920 1293.63965 1173.4545 1229.485 6431.130   100
Answers
nr: #1 dodano: 2016-12-27 20:12

This extra time is the time R takes to find the right length function. For a plain old list, it's pretty easy and optimised, its probably stored right there in the object. Get it, return it.

For a classed object, be it S3 or S4, R has to find the right length function because length could be defined as a method. So R has to go on a hunt, and in your cases it looks everywhere until it falls through to the default. By the time its done that, its spent those milliseconds.

Don't go unclassing things to try and speed this up unless you can tell your future self that you'll never write a length method on these objects, because your code will break...

Source Show
◀ Wstecz