There are two things that are important to keep in mind reading this:
- Writing pure functions is helpful for a whole load of reasons.
- Being able to compose smaller functions into a larger whole is a really good way to build complex processes
And a third one:
- Sometimes over-engineering something is fun and helps you learn.
Today, we’re going to discuss a way to handle a computation that is performed in some environment. By “environment”, I mean some set of configuration settings. The situation would be that we want to define some process that takes an input and some settings and then manipulates the input with reference to the settings.
- Different input, same settings: different output
- Same input, different settings: also different output
But the idea is that your settings are conceptually different from your inputs. E.g. a config file vs actual data or something like that.
You could build your environment to be some complex object, but I’m going to simply use a list:
e1 <- list(a = 2, b = 4, c = 5)
Each element of the list is some parameter that we want to have available in our computation.
But we’d like to do it by composing pure functions.
First Steps
The first and easiest way to take advantage of this environment is to simply define functions that reference their parent scope. R is a lexically scoped language. Basically, functions in R know where they were defined and can access variables in that location. If we define this a function like this:
uses_global_variables <- function(x) {
x + e1$a
}
We take in an argument, x
, but also use the a
value from e1
:
uses_global_variables(2)
## [1] 4
uses_global_variables(10)
## [1] 12
This has some upsides:
Firstly, it’s easily composed1:
uses_global_variables_twice <- purrr::compose(
uses_global_variables,
uses_global_variables
)
uses_global_variables_twice(2)
## [1] 6
Or perhaps
2 |>
uses_global_variables() |>
uses_global_variables()
## [1] 6
Secondly, depending on your context, you could change a value in e1
, load it into your session again, and uses_global_variables()
would know immediately the new value of a
to use.
But that could also be a downside. This reliance on something defined externally to the function means that it’s a impure function. I.e. it is not a ‘pure function’.
As a recap, pure functions are functions that:
- Given any particular input, they will always return the same input
- They do not have any ‘side effects’. This essentially means that they don’t read/write to files, they don’t have interaction with the user, they don’t throw errors etc. Strictly speaking, they wouldn’t even write to a console.
A heuristic for whether a function is pure is whether you could memoise it. Essentially, could you skip calculations and just look the answer up in a big reference table and the caller would be none the wiser? Writing as much of your program as you can with pure functions is generally a good strategy since pure functions are easy to reason about.
Anything pure functions use, they will take in as an argument so all the “entry points” for information are right there in the function definition. They rely on nothing that’s outside the function definition (i.e. nothing that can change without redefining the function - you can rely on functions from external libraries etc2).
A pure version of uses_global_variables()
would be:
takes_e_locally <- function(x, e) {
x + e$a
}
And this produces the same output as uses_global_variables()
, though we also have to pass in the environment e
:
takes_e_locally(2, e1)
## [1] 4
takes_e_locally(10, e1)
## [1] 12
Pure functions are also easier to test.
If we wanted to test uses_global_variables()
against some expected output, we’d need to get an e1
from somewhere and inject it in somehow.
But in contrast, we can test takes_e_locally()
simply by passing in some e
:
testthat::test_that("takes_e_locally works", {
etest <- list(a = 1)
testthat::expect_equal(
takes_e_locally(x = 2, e = etest),
3
)
})
## Test passed 😀
So takes_e_locally()
passes the purity test.
But we’ve just lost something that uses_global_variables()
had:
(convenient) composability.
As the input to the function is a value and an environment to use, but the output is just a number we can’t compose it the same way:
takes_e_locally_twice <- purrr::compose(
takes_e_locally,
takes_e_locally
)
takes_e_locally_twice(2)
## Error in `_fn`(2): argument "e" is missing, with no default
We’d need to keep passing in the environment to every function we call. We could do it like this:
takes_e_locally(2, e1) |>
takes_e_locally(e1)
## [1] 6
But that repetition of e1
means we need to pass it to every function in the chain that might need it and it could cause issues if we want to change the reference to some e2
, but we forget one e1
in the chain.
It would be better to be able to only pass in the environment config once and have the chain be aware of it, providing it to the functions automatically.
Making things more complicated to make them simpler
Bear with me here, things will get a little more complicated before they get simpler, but it’s worth the effort.
The first thing to notice about takes_e_locally()
(a function of two arguments) is that we can re-write this into a function that takes one argument but itself returns another function that takes another argument:
takes_e_locally_curried <- function(x) {
function(e) {
x + e$a
}
}
takes_e_locally_curried(2)(e1)
## [1] 4
This is called a curried function.
As shown above, we can evaluate the function with these double brackets: takes_e_locally_curried(2)
produces a function which we then pass e1
to and we get back the result.
takes_e_locally_curried(2)
essentially defines a computational process that, given some environment, carries out that computational process in the context of that environment.
We will call this first stage (before passing in an environment) a “reader”, as we get back a function that “reads” from the environment e
that we pass it.
To make it a little easier, and avoid double parentheses, let’s make a function that runs a reader for us:
run_reader <- function(reader, e) {
reader(e)
}
my_reader <- takes_e_locally_curried(2)
run_reader(my_reader, e1)
## [1] 4
And it’s easy to do the same computational process in another environment:
e2 <- list(a = 4, b = 6, c = 7)
run_reader(my_reader, e2)
## [1] 6
Same process, different config, different answer.
So say we have the following functions:
fun_a <- function(x, e) {
x + e$a
}
fun_b <- function(x, e) {
x + e$b
}
fun_c <- function(x, e) {
x + e$c
}
Remember, we want to be able to build a complex computation out of smaller parts (i.e. we want to use composition) and we don’t want to have to pass e1
in each time.
For now, let’s curry these into corresponding readers:
reader_a <- function(x) {
function(e) {
x + e$a
}
}
reader_b <- function(x) {
function(e) {
x + e$b
}
}
reader_c <- function(x) {
function(e) {
x + e$c
}
}
We are going to define a new pipe that knows how to deal with these reader functions. We’ll call this pipe, “bind”.
`%>>=%` <- function(lhs, rhs) {
function(e) {
a <- run_reader(lhs, e)
rb <- rhs(a)
run_reader(rb, e)
}
}
This is a little complex, so let’s break it down:
- This is going to be used as an infix function, we’re going to expect a reader on the left hand side (e.g.
reader_a(2)
) and the rest of the process on the right (some other function relying on the environment, i.e. another reader). - Note that bind is going to return a function that accepts some environment, i.e. the output will itself be a reader.
- But this isn’t quite like the other readers we’ve already defined:
- once it is passed an environment, it runs the left hand side reader in that environment and stores the value in a variable,
a
. - It then creates a new reader with an input of
a
. - The return value is this new reader evaluated in the environment passed in to the inner function.
- once it is passed an environment, it runs the left hand side reader in that environment and stores the value in a variable,
We’ll define one other function first.
I’d like to call this return
for reasons that I won’t go into here, but return
is a defined keyword in R, so let’s just stick with pure
for now, again for reasons that I won’t go into.
pure <- function(x) {
function(e) {
x
}
}
This is a way of injecting our input into the reader context. Essentially this makes a reader function (a function accepting an environment), that ignores its environment and just returns the value. Not that useful in general, but very useful for starting off a composition of readers.
Let’s take a look at an example:
run_reader(pure(2) %>>=% reader_a, e1)
## [1] 4
Let’s break down the first argument, i.e. the definition of the reader. We know we can rewrite this as:
`%>>=%`(pure(2), reader_a)
## function(e) {
## a <- run_reader(lhs, e)
## rb <- rhs(a)
## run_reader(rb, e)
## }
## <bytecode: 0x55d314742d78>
## <environment: 0x55d3145e5460>
Using the definition of %>>=%
, this expands to:
function(e) {
a <- run_reader(pure(2), e)
rb <- reader_a(a)
run_reader(rb, e)
}
We know that run_reader(pure(2), e)
, must return 2
, regardless of the environment, e
, as pure
ignores the environment and returns the value.
We substitute:
function(e) {
a <- 2
rb <- reader_a(2)
run_reader(rb, e)
}
This then boils down to:
function(e) {
run_reader(reader_a(2), e)
}
If we pass in e1
as e
, we get:
run_reader(reader_a(2), e1)
## [1] 4
As expected.
This one is a little more complex:
run_reader(pure(2) %>>=% reader_a %>>=% reader_b, e1)
## [1] 8
Again, let’s take the reader part:
pure(2) %>>=% reader_a %>>=% reader_b
It turns out that it only works if we have left-associativity i.e. we can parcel it up like
(pure(2) %>>=% reader_a) %>>=% reader_b
but not
pure(2) %>>=% (reader_a %>>=% reader_b)
Luckily, we already know what
pure(2) %>>=% reader_a
gives:
function(e) {
run_reader(reader_a(2), e)
}
Which, for simplicity, I’ll rewrite as:
\(e) {reader_a(2)(e)}
So substituting this in to the definition of %>>=%
:
function(e) {
a <- run_reader(\(e) {reader_a(2)(e)}, e)
rb <- reader_b(a)
run_reader(rb, e)
}
Expanding:
function(e) {
run_reader(reader_b(reader_a(2)(e)), e)
}
Or
function(e) {
reader_b(reader_a(2)(e))(e)
}
Then using the definition of reader_a()
:
function(e) {
reader_b(2 + e$a)(e)
}
Then the definition of reader_b()
:
function(e) {
2 + e$a + e$b
}
So now we’ve gone through all of that to show our composition of a readers pure(2) %>>=% reader_a %>>=% reader_b
is itself a reader function (a function that takes an e
environment and evaluates some computation in that environment).
So:
run_reader(pure(2) %>>=% reader_a %>>=% reader_b, e1)
evaluates to 8 as expected:
2 + e1$a + e2$a
## [1] 8
Cleaning things up
That did get a little complicated. Let’s recap where we’ve got to, and why we started in the first place:
- Two things are important:
- pure functions are easier to deal with
- composition is powerful
- Without resorting to something fancy, if we want to be able to define a more complex calculation that refers to some config environment, we can have purity but not composition or composition without purity.
- But we curried our function and with a suitable definition of a special pipe infix function,
%>>=%
, we got both at the expense of having to userun_reader()
to run the calculation.
But I don’t want to have to write all my functions in the curried way.
I’d prefer to write single level functions, i.e. functions of two arguments and have the pipe handle the currying and uncurrying.
Luckily, that’s fairly simple with this small addition to %>>=%
:
`%>>=%` <- function(reader, rhs) {
function(e) {
a <- run_reader(reader, e)
rb = purrr::partial(rhs, a)
run_reader(rb, e)
}
}
Then we can use the uncurried fun_a()
, fun_b()
and fun_c()
:
run_reader(pure(2) %>>=% fun_a %>>=% fun_b, e1)
## [1] 8
The final thing we want to do is allow functions that don’t take a the environment. At the moment, we get an error:
just_adds_1 <- function(x) {
x + 1
}
run_reader(pure(2) %>>=% just_adds_1 %>>=% fun_b, e1)
## Error in (function (x) : unused argument (list(2, 4, 5))
Because we’re trying to force the environment into the function where there’s no where for it to go.
We need to write a handler for this function
wrapper <- function(f) {
function(x, e) {
f(x)
}
}
run_reader(pure(2) %>>=% wrapper(just_adds_1) %>>=% fun_b, e1)
## [1] 7
Why not just use with
?
We could have done this:
with(list(e = e1),
2 |> # pure(2) %>>=%
fun_a(e) |> # fun_a
fun_b(e)) # fun_b
## [1] 8
Which sort of lets us compose (at least in the pipe way, if not the mathematical, point-free way), and means there’s only one place to change the e
.
It’s also simpler to read, as you only need to get your head around with
rather than %>>=%
etc.
For actual use, probably go with with
or some other system for managing global configs.
But you should have learned by now that coming here for useful tips won’t always work out that way. Playing around with functions and evaluation and pipes exposes interesting new ways to program. This post was inspired by the Reader monad in Haskell. And it wasn’t really until I started reading Bartosz Milewski’s book Category Theory for Programmers and tried to implement it in R that I really started to get how it worked. Disclaimer: I’m sure someone more knowledgeable about it than I am can find some misunderstanding or some subtlety I’ve missed, so don’t take Reader monad advice from this blog, do your own research!
purrr::compose()
takes several functions and “stitches” them together, i.e. given anf1
and andf2
, it creates a new function that accepts input, passes it tof1
, then passes the output off1
intof2
and then returns the output off2
.purrr::compose(f1, f2)
is expressed in maths as \(f_2 \circ f_1\), or in words, “f2 after f1”↩︎strictly speaking, only if those functions are themselves pure. A function that calls an impure function cannot itself be pure.↩︎