Functions to crash
R has a few ways to signal that something has gone wrong.
For example, there’s the aptly-named stop()
for when you need the program to stop completely:
stop("There was an error")
## Error in eval(expr, envir, enclos): There was an error
Or the newer cli::cli_abort()
:
cli::cli_abort("There was an error")
## Error:
## ! There was an error
As an aside, I like cli::cli_abort()
, I think it’s what powers all/most of the tidyverse errors (taking over from rlang::abort()
).
It comes when {glue}-like syntax and fancier error prompts:
x <- 100
cli::cli_abort(c("There was an error because x had value: {x}",
"i" = "x cannot have value ",
"x" = "Don't do that again."))
## Error:
## ! There was an error because x had value: 100
## ℹ x cannot have value
## ✖ Don't do that again.
But throwing an error like this will stop your program - if the code is being used interactively like how a lot of R will be used, that’s probably the right thing to do.
If you write an informative error message that allows the user to pinpoint the issue, they can quickly fix it and get on their way.
For example, say we try to access a column that doesn’t exist using dplyr::select()
:
library(dplyr)
library(palmerpenguins)
penguins %>% select(mpg)
## Error in `select()`:
## ! Can't subset columns that don't exist.
## ✖ Column `mpg` doesn't exist.
This immediately breaks and tells you exactly what’s wrong.
But we have other options. What if we wanted to handle functions that could fail in a bit more of an elegant way.
Option
or Maybe
Many stronger typed languages have things called Option
1 or Maybe
2.
These can be thought of as a box that might contain a value, but could also contain nothing.
If you have a function that you’d really like to get an integer from usually, you may find when you write that function that there are cases where you won’t get an integer back.
You could raise an error but if you want to be fancier, you could write your whole function to return a Maybe Int
, that is, an integer wrapped in this Maybe
box.
In the case that the function works you are provided with a box that has your integer in it.
This is often called a Just
, as in Just 4
or Just 1003
.
The “Just
” bit means that the function worked.
In the case that the function did not work, you’d get back a Nothing
.
No error is thrown, but you know the function didn’t work because of what you got back.
The key here is that both of these things, Just 4
and Nothing
are of the type Maybe
in stronger typed languages.
R doesn’t have a fantastic type system but we can pretend a bit using the S3 class system.
Maybe
in R
So we first need a way to make things of class maybe
.
Recall that we want to be able to model two states:
- a “success” state
- a “failure” state
To do this, we will make two constructor functions for objects in our maybe
class.
These are:
just <- function(x) {
output <- list(
value = x,
constructor = "just"
)
class(output) <- "maybe"
output
}
nothing <- function(x) {
output <- list(
constructor = "nothing"
)
class(output) <- "maybe"
output
}
Note the following things:
- In both instances we make an object with class
maybe
- Both constructors are modelled as a list with a
constructor
element that holds whether this is ajust
or anothing
. - The
nothing
has novalue
in its list.
We may as well add a nice print
method too:
print.maybe <- function(x) {
if (x$constructor == "just") {
cat("Just\n")
print(x$value)
} else if (x$constructor == "nothing") {
cat("Nothing")
}
}
This lets us show both states fairly neatly:
just(4)
## Just
## [1] 4
nothing()
## Nothing
Functions that Fail Gracefully
So now, let’s make some changes to the tidyverse to make functions that might fail, but do so by returning maybe
s
Recall that earlier we couldn’t dplyr::select()
a column that didn’t exist:
select(penguins, mpg)
## Error in `select()`:
## ! Can't subset columns that don't exist.
## ✖ Column `mpg` doesn't exist.
What we want is a new function, which we will call mselect()
that will return:
just(result)
if the function works, ornothing()
if the function fails
I’m going to implement this as a simple wrapper over the {dplyr} function that just uses tryCatch
to detect the error:
mselect <- function(.data, ...) {
tryCatch(just(dplyr::select(.data, ...)),
error = function(e) nothing())
}
We’re accepting the same arguments as dplyr::select()
and trying them in that function.
The tryCatch
function detects whether the dplyr::select()
throws an error:
- if it doesn’t, then
tryCatch
returns the result - if it does, we execute the
error
function and return that result. In this case,nothing()
In practice, this will look like:
# works
mselect(penguins, species)
## Just
## # A tibble: 344 × 1
## species
## <fct>
## 1 Adelie
## 2 Adelie
## 3 Adelie
## 4 Adelie
## 5 Adelie
## 6 Adelie
## 7 Adelie
## 8 Adelie
## 9 Adelie
## 10 Adelie
## # … with 334 more rows
# doesn't work
mselect(penguins, mpg)
## Nothing
Fantastic, but now we can’t pipe:
mselect(penguins, species) %>%
filter(species == "Adelie")
## Error in UseMethod("filter"): no applicable method for 'filter' applied to an object of class "maybe"
Because dplyr::filter()
expects a data.frame
, but got a maybe
instead.
We could re-write all our functions to accept a maybe
and then give back a maybe
.
But then all our functions would have duplicated code to check whether the maybe
is a just
value or a nothing
, then if it is a just
value, we would need to extract that value and run the function.
If we were to implement that, it would mean there would be loads of duplication in our functions and the functions themselves wouldn’t be doing a single thing well, they would be doing multiple things: Generally a bad way to design functions.
More fun with pipes
But, we can do interesting things with pipes.
One of these will be to handle all the unwrapping and re-wrapping of our maybe
values for us.
The full definition is:
`%>>=%` <- function(lhs_prime, rhs) {
if (!inherits(lhs_prime, "maybe")) {
stop("type error")
}
if (lhs_prime$constructor == "nothing") {
nothing()
} else {
# Regular pipe stuff
rhs <- substitute(rhs)
lhs <- substitute(lhs_prime$value)
kind <- 1L
env <- parent.frame()
lazy <- TRUE
.External2(magrittr:::magrittr_pipe)
}
}
Things to note:
- We first check to make sure we’re dealing with
maybe
values when we use this pipe. Fancier languages with fancier type systems would do this for us with their type-checking facilities as part of the language, but us type-paupers have to do it ourselves. - Then we check if we are getting a
maybe
value that’s anothing
. If we are, then whatever the function in therhs
is, we know we should be returningnothing
. A function whose inputs have failed, must surely fail as well. - Finally, if we have a
just
value, we pull out thevalue
and then do regular pipe stuff from {magrittr}3.
To demonstrate, let’s make another tidyverse function into a verison that returns a maybe
.
mfilter <- function(.data, ...) {
tryCatch(just(dplyr::filter(.data, ...)),
error = function(e) nothing())
}
It’s to dplyr::filter()
what mselect()
was to dplyr::select()
.
And now we can do things like:
just(penguins) %>>=%
mselect(species) %>>=%
mfilter(species == "Chinstrap")
## Just
## # A tibble: 68 × 1
## species
## <fct>
## 1 Chinstrap
## 2 Chinstrap
## 3 Chinstrap
## 4 Chinstrap
## 5 Chinstrap
## 6 Chinstrap
## 7 Chinstrap
## 8 Chinstrap
## 9 Chinstrap
## 10 Chinstrap
## # … with 58 more rows
And our pipe-chain can fail at any points, but the nothing
values will be seamlessly propagated through:
just(penguins) %>>=%
mselect(mpg) %>>=% # <-- fails here
mfilter(species == "Chinstrap")
## Nothing
or
just(penguins) %>>=%
mselect(species) %>>=%
mfilter(mpg == "Chinstrap") # <-- fails here
## Nothing
We have to start with just(penguins)
rather than penguins
as %>>=%
expects a maybe
on the left.
If this bothers you, you could make a new infix pipe that doesn’t need a maybe
on the left and use that as the first pipe.
Simpler Piping
But we’ve had to individually redesign all our tidyverse functions.
I’d prefer not to have to copy and paste them all and make m*
versions.
Luckily, we don’t have to if we notice that mselect()
and mfilter()
both follow a similar pattern.
We can exploit this pattern by making a function factory:
A new function that will accept a tidyverse function as an argument and make a new version that returns a maybe
.
mwrap <- function(f) {
function(...) {
tryCatch(just(f(...)),
error = function(e) nothing())
}
}
Note how this function itself returns a function that looks like mselect()
and mfilter()
(if you blur your eyes a bit).
This means we can do pipelines like this:
just(penguins) %>>=%
mwrap(group_by)(species) %>>=%
mwrap(summarise)(bill_length_mm = mean(bill_length_mm, na.rm=TRUE),
bill_depth_mm = mean(bill_depth_mm, na.rm = TRUE)) %>>=%
mwrap(mutate)(bill_area = bill_length_mm * bill_depth_mm)
## Just
## # A tibble: 3 × 4
## species bill_length_mm bill_depth_mm bill_area
## <fct> <dbl> <dbl> <dbl>
## 1 Adelie 38.8 18.3 712.
## 2 Chinstrap 48.8 18.4 900.
## 3 Gentoo 47.5 15.0 712.
And if this fails at any point, we will get a nothing
back.
Conclusion
This implementation suffers a bit because R’s type system isn’t that strong.
In languages like Haskell, Maybe
s are used a lot, and the >>=
syntax is far more powerful than I have implemented here4.
Is this a useful thing? I don’t know, that’s up to you.
Is this an interesting thing? Yes, that’s up to me, and yes it is.