R’s Object Systems

(well, just S3, S4 and R7)

Michael Jones

24 August 2022

Introduction

Have you ever wondered about print() and summary()?

Make some data

my_vector <- 1:5

my_lm <- lm(mpg ~ hp, data = mtcars)

my_df <- mtcars

print()

print(my_vector)
[1] 1 2 3 4 5
print(my_lm)

Call:
lm(formula = mpg ~ hp, data = mtcars)

Coefficients:
(Intercept)           hp  
   30.09886     -0.06823  
print(my_df)
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

summary()

summary(my_vector)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       2       3       3       4       5 
summary(my_lm)

Call:
lm(formula = mpg ~ hp, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.7121 -2.1122 -0.8854  1.5819  8.2360 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
hp          -0.06823    0.01012  -6.742 1.79e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.863 on 30 degrees of freedom
Multiple R-squared:  0.6024,    Adjusted R-squared:  0.5892 
F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07
summary(my_df)
      mpg             cyl             disp             hp       
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
      drat             wt             qsec             vs        
 Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
 Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
 Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
 Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
       am              gear            carb      
 Min.   :0.0000   Min.   :3.000   Min.   :1.000  
 1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
 Median :0.0000   Median :4.000   Median :2.000  
 Mean   :0.4062   Mean   :3.688   Mean   :2.812  
 3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
 Max.   :1.0000   Max.   :5.000   Max.   :8.000  

Possibly

summary <- function(x) {
  if (is.numeric(x)) {
    # ...
  } else if (is.lm(x)) {
    # ...
  } else if (is.df(x)) {
    # ...
  }
  # ...
}

but:

  • What happens when R core makes a new class?
  • What happens when we make a new class?
  • Mixing duties

OO can help

Learning Resources

  • Advanced R by Hadley Wickham (for S3 and S4)
  • R7 github repo: https://github.com/RConsortium/OOP-WG
  • Getting stuck in

When to use OO

  • Writing Packages
  • Writing code in your problem domain
    • stop talking about data.frame() and start talking about bond, equity or patient, hospital
    • Domain Driven Design by Scott Wlaschin
  • Nicer UIs
  • Better data organisation

When not to use OO

  • When existing things work fine
  • “Just” doing analysis
  • Exists already in a package

Quick Aside:
Types of OO

“Normal” vs “Functional”

Normal (Python)

  • objects encapsulate methods
  • objects do things
    • dog.bark(at = "postman")

Functional (R)

  • Objects are mostly about holding data
  • Methods exist separate from Objects
    • interact(dog, postman)

Overview of R’s OO systems

S3

  • Simple and easy
  • Bare minimum
  • “Sure, yeah, that’s a $CLASS”

S4

  • More formal
  • Better for large teams
  • More complex

R7

  • New (experimental)
  • Simple but powerful

S3

Making a new class

human <- function(name, dob) {
  output <- list(
    name = name,
    dob = dob
  )
  class(output) <- "human"
  output
}

alice <- human(name = "Alice", dob = as.Date("1980-04-10"))

alice
$name
[1] "Alice"

$dob
[1] "1980-04-10"

attr(,"class")
[1] "human"

S3 Dispatch

  • You type: print(alice)
  • R sees:
    1. You want to print() something
    2. The thing has class "human"
  • R knows print is a “generic”
  • R checks for a print.human() function
  • If found, use that, i.e. print.human(alice)
  • If not, try the next class or error

Generics and Methods

  • Generics:
    • The function you type
    • “works” on many different classes
  • Methods:
    • You never call this directly
    • Does the specific actions for that specific class.
  • summary() is a generic, summary.numeric() and summary.lm() are methods

Make your own Generics and Methods

# Generic
introduce <- function(x, ...) {
  UseMethod("introduce")
}

# Method
introduce.human <- function(h) {
  cat("Hello, my name is", h$name)
}


introduce(alice)
Hello, my name is Alice

New Classes

r_user <- function(name, dob, favourite_package) {
  output <- list(
    name = name,
    dob = dob,
    favourite_package = favourite_package
  )
  class(output) <- c("ruser", "human")
  output
}

bob <- r_user(name = "Bob",
              dob = as.Date("1985-10-13"),
              favourite_package = "ggplot2")

What do you think will happen if we introduce(bob)?

hint:

class(bob)
[1] "ruser" "human"
introduce(bob)
Hello, my name is Bob

We get the introduce.human() method. R works its way down the class vector until it finds something it recognises.

A more specific method

introduce.ruser <- function(r) {
  cat("Hello, my name is",
      r$name,
      "and I'm a big fan of",
      r$favourite_package)
}

introduce(bob)
Hello, my name is Bob and I'm a big fan of ggplot2

Pitfalls

an object is whatever it’s class says it is:

class(bob) <- "lm"

print(bob)

Call:
NULL

No coefficients

Harder to enforce quality.

And S3 only does single dispatch. Methods are only called based on the first argument.

S4

S4

  • S4 is more formal, and more verbose
  • But it is more powerful
  • Common in Bioconductor

S4 humans

setClass(
  "Human",
  slots = c(
    name = "character",
    dob = "Date"
  )
)

S4 Alice

We get some error checking for free:

alice <- new("Human", name = "Alice", dob = 123)
Error in validObject(.Object): invalid class "Human" object: invalid object for slot "dob" in class "Human": got class "numeric", should be or extend class "Date"
alice <- new("Human", name = "alice", dob = as.Date("1980-04-10"))

Accessing Data

You can do:

alice@name
[1] "alice"

Note we’re using @ to access slots, rather than $.

Accessing Data

But it’s better to make getters and setters:

setGeneric("name", function(x) standardGeneric("name"))
[1] "name"
setGeneric("name<-", function(x, value) standardGeneric("name<-"))
[1] "name<-"
setMethod("name", "Human", function(x) x@name)
setMethod("name<-", "Human", function(x, value) {
  x@name <- value
  x
})

name(alice)
[1] "alice"
name(alice) <- "Alice Bloggs"
name(alice)
[1] "Alice Bloggs"

Or just a getter

setGeneric("favPackage", function(x) standardGeneric("favPackage"))
[1] "favPackage"
setMethod("favPackage", "Ruser", function(x) x@favourite_package)
  • Slots should be considered “developer-only” as they can change
  • Getters and Setters create a user-facing interface

S4 R users

We can inherit directly

setClass(
  "Ruser",
  contains = "Human",
  slots = c(
    favourite_package = "character"
  )
)

S4 R Users

We can inherit directly

bob <- new(
  "Ruser",
  name = "Bob",
  dob = as.Date("1985-10-13"),
  favourite_package = "ggplot2")

bob
An object of class "Ruser"
Slot "favourite_package":
[1] "ggplot2"

Slot "name":
[1] "Bob"

Slot "dob":
[1] "1985-10-13"

Multiple Dispatch

Generic:

setGeneric(
  "Meets",
  function(x, y) standardGeneric("Meets"),
  signature = c("x", "y")
)
[1] "Meets"

Method:

setMethod(
  "Meets",
  signature = c("Human", "Human"),
  function(x, y) {
    cat(name(x), "says hello to", name(y))
  })

Meets(alice, bob)
Alice Bloggs says hello to Bob

When R users get involved

setMethod(
  "Meets",
  signature = c("Ruser", "Human"),
  function(x, y) {
    cat(name(x), "says hello to",
        name(y), "and tries not to talk about",
        favPackage(x))
  }
)

Meets(bob, alice)
Bob says hello to Alice Bloggs and tries not to talk about ggplot2

Or to each other

setMethod(
  "Meets",
  signature = c("Ruser", "Ruser"),
  function(x, y) {
    cat(name(x), "tells",
        name(y), "all about",
        favPackage(x))
  }
)

charlie <- new("Ruser", name = "Charlie",
               dob = as.Date("1990-03-05"),
               favourite_package = "dplyr")

Meets(bob, charlie)
Bob tells Charlie all about ggplot2

Multiple Dispatch

Is a powerful way to allow control how many different classes interact

  • e.g. you have a general algorithm that works for a wide range of classes of input
  • Make a generic and a “default” method
  • For a narrower range of classes of input, there’s a more performant algorithm
  • Implement a method that dispatches off the specific argument classes

Listing Availabile Methods

methods("Meets")
[1] Meets,Human,Human-method Meets,Ruser,Human-method Meets,Ruser,Ruser-method
see '?methods' for accessing help and source code
cat(paste(methods("Meets"), collapse = "\n"))
Meets,Human,Human-method
Meets,Ruser,Human-method
Meets,Ruser,Ruser-method

R7

R7

  • Currently exists as a package, and it’s not yet on CRAN
  • Experimental, but aiming to be incorporated into R proper
  • Developed by representatives from RStudio/Posit, R-Core, BioConductor
library(R7)

R7 Humans

human <- new_class(
  "human",
  properties = list(
    name = class_character,
    dob = class_Date
  )
)

alice <- human(
  name = "alice",
  dob = as.Date("1980-04-10"))

alice
<human>
 @ name: chr "alice"
 @ dob : Date[1:1], format: "1980-04-10"

Accessors

It seems that @ is not discouraged

alice@name
[1] "alice"

Generics and Methods

introduce <- new_generic(
  "introduce",
  "x"
  )

method(introduce, human) <-
  function(x) {
    cat("Hello, I'm", x@name)
  }

introduce(alice)
Hello, I'm alice

Inheritance

r_user <- new_class(
  "ruser",
  parent = human,
  properties = list(
    favourite_package = class_character
  )
)

bob <- r_user(
  name = "Bob",
  dob = as.Date("1985-10-13"),
  favourite_package = "ggplot2"
)

Multiple Dispatch

meets <- new_generic(
  "meets",
  c("x", "y")
)

method(meets,
       list(human, human)) <- 
  function(x, y) {cat(x@name, "greets", y@name)}

method(meets,
       list(r_user, r_user)) <-
  function(x, y) {cat(x@name, "tells",
                      y@name, "all about",
                      x@favourite_package)}

Multiple Dispatch

meets(alice, alice)
alice greets alice
meets(alice, bob)
alice greets Bob
meets(bob, alice)
Bob greets alice
meets(bob, bob)
Bob tells Bob all about ggplot2

Closing

Closing

  • R’s object systems are powerful tools for abstraction
  • But you might not need them day to day
  • If you do, start with S3
  • R7 is exciting and coming soon