R: using NSE with select helpers and default arguments
Introduction
Once you adopt the tidyverse philosophy into your R code, at a certain time you will start writing functions which invoke dplyr’s non-standard evaluation (NSE) mechanisms.
An example is the following. Consider this dataset:
library(tidyverse)
storms
name | year | month | day | hour | lat | long | status | category | wind | pressure | ts_diameter | hu_diameter |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Amy | 1975 | 6 | 27 | 0 | 27.5 | -79.0 | tropical depression | -1 | 25 | 1013 | NA | NA |
Amy | 1975 | 6 | 27 | 6 | 28.5 | -79.0 | tropical depression | -1 | 25 | 1013 | NA | NA |
Amy | 1975 | 6 | 27 | 12 | 29.5 | -79.0 | tropical depression | -1 | 25 | 1013 | NA | NA |
Amy | 1975 | 6 | 27 | 18 | 30.5 | -79.0 | tropical depression | -1 | 25 | 1013 | NA | NA |
Amy | 1975 | 6 | 28 | 0 | 31.5 | -78.8 | tropical depression | -1 | 25 | 1012 | NA | NA |
Amy | 1975 | 6 | 28 | 6 | 32.4 | -78.7 | tropical depression | -1 | 25 | 1012 | NA | NA |
Amy | 1975 | 6 | 28 | 12 | 33.3 | -78.0 | tropical depression | -1 | 25 | 1011 | NA | NA |
Amy | 1975 | 6 | 28 | 18 | 34.0 | -77.0 | tropical depression | -1 | 30 | 1006 | NA | NA |
Amy | 1975 | 6 | 29 | 0 | 34.4 | -75.8 | tropical storm | 0 | 35 | 1004 | NA | NA |
Amy | 1975 | 6 | 29 | 6 | 34.0 | -74.8 | tropical storm | 0 | 40 | 1002 | NA | NA |
Suppose that you want to select all columns related to time (i.e. year
, month
, day
, hour
).
Using the standard dplyr tools, this is done using dplyr::select
:
storms_ymdh <- storms %>% select(year, month, day, hour)
storms_ymd <- storms %>% select(year, month, day)
storms_ym <- storms %>% select(year, month)
storms_y <- storms %>% select(year)
This can be incapsulated into a function, as usual:
fun_select <- function(df) {
df %>% select(year, month, day, hour)
}
identical(storms_ymdh, storms %>% fun_select())
## [1] TRUE
select
provides several helpers to indicate which variables are kept.
An example is one_of()
:
identical(storms_ymdh,
storms %>% select(one_of(c('year', 'month', 'day', 'hour'))) )
## [1] TRUE
fun_select_2 <- function(df) {
df %>% select(one_of(c('year', 'month', 'day', 'hour')))
}
identical(storms_ymdh, storms %>% fun_select_2())
## [1] TRUE
Adding arguments
Now, we want to generalize the mechanism, and allow the user to decide which variables are selected.
The easiest adaptation does not use NSE at all, we just pass the vector of columns to select:
fun_select_3 <- function(df, cols) {
df %>% select(one_of(cols))
}
identical(storms_ymdh, storms %>% fun_select_3(c('year', 'month', 'day', 'hour')))
## [1] TRUE
identical(storms_ymd, storms %>% fun_select_3(c('year', 'month', 'day')))
## [1] TRUE
Enter the NSE
By reading up on NSE, we can make fun_select_3
behave as select
behaves, without explicitly quoting the argument:
fun_select_NSE <- function(df, ...) {
dots <- enquos(...)
df %>% select(!!!dots)
}
identical(storms_ymdh, storms %>% fun_select_NSE(year, month, day, hour))
## [1] TRUE
identical(storms_ymd, storms %>% fun_select_NSE(year, month, day))
## [1] TRUE
It also works with all select
helpers:
# one_of
identical(storms_ymd, storms %>% fun_select_NSE(one_of(c('year', 'month', 'day'))) )
## [1] TRUE
identical(storms_ymd, storms %>% select(one_of(c('year', 'month', 'day'))) )
## [1] TRUE
# Column range
identical(storms_ymdh, storms %>% fun_select_NSE(year:hour))
## [1] TRUE
Why it works
The three-dot parameter ...
captures all arguments1 of fun_select_NSE
.
Then enquos
quotes the list, essentially blocking evaluation until an unquoting operation appears.
In this case, the evaluation continues when the !!!
is encountered: that operator inserts the names in place of the quoted variables.
Also, !!!
performs a splicing operation: arguments are added as to the surrounding function, separated by commas. (It is similar to Python’s unpacking operator *
). If we were to accept a single argument, instead, we could have just used enquo
and !!
.
Notice that the mechanism works for any kind of expression, like one_of(c('year', 'month'))
.
First, ...
are set to be equal to the argument, but it is not evaluated because of enquos
effect.
!!!
, then, writes back the user’s expression as select
’s argument.
To know more about tidyverse
’s quo
/enquo
/quotation/quasiquotation/!!
, please read this wonderful post!
More R-ready material is available in Hadley Wickham’s Advanced R, and the quasiquotation page in rlang-package documentation.
Modifying default parameters with NSE
However, we would like to provide a default parameter which performs some selection.
E.g. select only year
if no argument is passed.
It is very easy to do for the non-NSE functions, if you stick to strings:
fun_select_3_default <- function(df, cols_default = 'year') {
df %>% select(one_of(cols_default))
}
identical(storms_ymdh, storms %>% fun_select_3_default(c('year', 'month', 'day', 'hour')) )
## [1] TRUE
identical(storms_y, storms %>% fun_select_3_default() )
## [1] TRUE
If you want to pass more complicated expressions, however, the NSE version is a little trickier:
fun_select_NSE_default <- function(df, ...) {
dots <- enquos(...)
if (length(dots) != 0) {
# What is passed if dots are not empty
select_var_true <- dots
} else {
# Default value: year
select_var_true <- quo(year)
}
df %>% select(!!!select_var_true)
}
identical(storms_ym, storms %>% fun_select_NSE_default(year, month))
## [1] TRUE
identical(storms_y, storms %>% fun_select_NSE_default())
## Warning: Unquoting language objects with `!!!` is deprecated as of rlang 0.4.0.
## Please use `!!` instead.
##
## # Bad:
## dplyr::select(data, !!!enquo(x))
##
## # Good:
## dplyr::select(data, !!enquo(x)) # Unquote single quosure
## dplyr::select(data, !!!enquos(x)) # Splice list of quosures
##
## This warning is displayed once per session.
## [1] TRUE
We can also use rlang::quo_is_missing
, which returns TRUE
if the quosure is empty (in this case, when nothing lies in the place of ...
):
fun_select_NSE_default_2 <- function(df, ...) {
dots <- enquos(...)
if (all(purrr::map_lgl(dots, rlang::quo_is_missing))) {
# Default value: year
select_var_true <- quo(year)
} else {
# What is passed if dots are not empty
select_var_true <- dots
}
df %>% select(!!!select_var_true)
}
identical(storms_ym, storms %>% fun_select_NSE_default_2(year, month))
## [1] TRUE
identical(storms_y, storms %>% fun_select_NSE_default_2())
## [1] TRUE
Note that purrr::map
is necessary, since we are dealing with multiple comma-separated parameters (captured by the dots).
rlang::quo_is_missing
works only with a single quosure, dots
is a list of quosures!
For single parameters, one can also skip ...
and use directly a variable which holds the quoted arguments:
fun_select_NSE_default_3 <- function(df, select_var) {
select_var_true <- enquo(select_var)
if (rlang::quo_is_missing(select_var_true)) {
# Default value: year, month
select_var_true <- quo(one_of(c('year', 'month')))
}
df %>% select(!!select_var_true)
}
identical(storms_ym, storms %>% fun_select_NSE_default_3())
## [1] TRUE
identical(storms_ym, storms %>% fun_select_NSE_default_3(one_of(c('year', 'month'))))
## [1] TRUE
We can also set default parameters which are more complex than a single variable name, e.g., a select helper split in two arguments:
fun_select_NSE_default_4 <- function(df, ...) {
dots <- enquos(...)
if (all(map_lgl(dots, rlang::quo_is_missing))) {
# Default value: year, month
select_var_true <- quos(one_of(c('year', 'month')), day)
} else {
# What is passed if dots are not empty
select_var_true <- dots
}
df %>% select(!!!select_var_true)
}
identical(storms_ymd, storms %>% fun_select_NSE_default_4(one_of(c('year', 'month')), day))
## [1] TRUE
identical(storms_ymd, storms %>% fun_select_NSE_default_4())
## [1] TRUE
Notice the usage of quos
instead of quo
: there we are quoting two parameters, not one!
Thanks for reading!
parameter: what is declared, argument: what is actually passed.↩