R: using NSE with select helpers and default arguments

Introduction

Once you adopt the tidyverse philosophy into your R code, at a certain time you will start writing functions which invoke dplyr’s non-standard evaluation (NSE) mechanisms.

An example is the following. Consider this dataset:

library(tidyverse)

storms
name year month day hour lat long status category wind pressure ts_diameter hu_diameter
Amy 1975 6 27 0 27.5 -79.0 tropical depression -1 25 1013 NA NA
Amy 1975 6 27 6 28.5 -79.0 tropical depression -1 25 1013 NA NA
Amy 1975 6 27 12 29.5 -79.0 tropical depression -1 25 1013 NA NA
Amy 1975 6 27 18 30.5 -79.0 tropical depression -1 25 1013 NA NA
Amy 1975 6 28 0 31.5 -78.8 tropical depression -1 25 1012 NA NA
Amy 1975 6 28 6 32.4 -78.7 tropical depression -1 25 1012 NA NA
Amy 1975 6 28 12 33.3 -78.0 tropical depression -1 25 1011 NA NA
Amy 1975 6 28 18 34.0 -77.0 tropical depression -1 30 1006 NA NA
Amy 1975 6 29 0 34.4 -75.8 tropical storm 0 35 1004 NA NA
Amy 1975 6 29 6 34.0 -74.8 tropical storm 0 40 1002 NA NA

Suppose that you want to select all columns related to time (i.e. year, month, day, hour).
Using the standard dplyr tools, this is done using dplyr::select:

storms_ymdh <- storms %>% select(year, month, day, hour)
storms_ymd <- storms %>% select(year, month, day)
storms_ym <- storms %>% select(year, month)
storms_y <- storms %>% select(year)

This can be incapsulated into a function, as usual:

fun_select <- function(df) {
   df %>% select(year, month, day, hour)
}

identical(storms_ymdh, storms %>% fun_select())
## [1] TRUE

select provides several helpers to indicate which variables are kept.
An example is one_of():


identical(storms_ymdh, 
          storms %>% select(one_of(c('year', 'month', 'day', 'hour'))) )
## [1] TRUE

fun_select_2 <- function(df) {
   df %>% select(one_of(c('year', 'month', 'day', 'hour')))
}

identical(storms_ymdh, storms %>% fun_select_2())
## [1] TRUE

Adding arguments

Now, we want to generalize the mechanism, and allow the user to decide which variables are selected.

The easiest adaptation does not use NSE at all, we just pass the vector of columns to select:

fun_select_3 <- function(df, cols) {
   df %>% select(one_of(cols))
}

identical(storms_ymdh, storms %>% fun_select_3(c('year', 'month', 'day', 'hour')))
## [1] TRUE
identical(storms_ymd, storms %>% fun_select_3(c('year', 'month', 'day')))
## [1] TRUE

Enter the NSE

By reading up on NSE, we can make fun_select_3 behave as select behaves, without explicitly quoting the argument:

fun_select_NSE <- function(df, ...) {
   dots <- enquos(...)
   df %>% select(!!!dots)
}

identical(storms_ymdh, storms %>% fun_select_NSE(year, month, day, hour))
## [1] TRUE
identical(storms_ymd, storms %>% fun_select_NSE(year, month, day))
## [1] TRUE

It also works with all select helpers:

# one_of
identical(storms_ymd, storms %>% fun_select_NSE(one_of(c('year', 'month', 'day'))) )
## [1] TRUE
identical(storms_ymd, storms %>% select(one_of(c('year', 'month', 'day'))) )
## [1] TRUE

# Column range
identical(storms_ymdh, storms %>% fun_select_NSE(year:hour))
## [1] TRUE

Why it works

The three-dot parameter ... captures all arguments1 of fun_select_NSE. Then enquos quotes the list, essentially blocking evaluation until an unquoting operation appears.
In this case, the evaluation continues when the !!! is encountered: that operator inserts the names in place of the quoted variables.

Also, !!! performs a splicing operation: arguments are added as to the surrounding function, separated by commas. (It is similar to Python’s unpacking operator *). If we were to accept a single argument, instead, we could have just used enquo and !!.

Notice that the mechanism works for any kind of expression, like one_of(c('year', 'month')).
First, ... are set to be equal to the argument, but it is not evaluated because of enquos effect. !!!, then, writes back the user’s expression as select’s argument.

To know more about tidyverse’s quo/enquo/quotation/quasiquotation/!!, please read this wonderful post!
More R-ready material is available in Hadley Wickham’s Advanced R, and the quasiquotation page in rlang-package documentation.

Modifying default parameters with NSE

However, we would like to provide a default parameter which performs some selection.
E.g. select only year if no argument is passed.

It is very easy to do for the non-NSE functions, if you stick to strings:

fun_select_3_default <- function(df, cols_default = 'year') {
   df %>% select(one_of(cols_default))
}

identical(storms_ymdh, storms %>% fun_select_3_default(c('year', 'month', 'day', 'hour')) )
## [1] TRUE
identical(storms_y, storms %>% fun_select_3_default() )
## [1] TRUE

If you want to pass more complicated expressions, however, the NSE version is a little trickier:

fun_select_NSE_default <- function(df, ...) {
      
   dots <- enquos(...)

   if (length(dots) != 0) {
      # What is passed if dots are not empty
      select_var_true <- dots
   } else {
      # Default value: year
      select_var_true <- quo(year)
   }
   
   df %>% select(!!!select_var_true)
}
identical(storms_ym, storms %>% fun_select_NSE_default(year, month))
## [1] TRUE
identical(storms_y, storms %>% fun_select_NSE_default())
## Warning: Unquoting language objects with `!!!` is deprecated as of rlang 0.4.0.
## Please use `!!` instead.
## 
##   # Bad:
##   dplyr::select(data, !!!enquo(x))
## 
##   # Good:
##   dplyr::select(data, !!enquo(x))    # Unquote single quosure
##   dplyr::select(data, !!!enquos(x))  # Splice list of quosures
## 
## This warning is displayed once per session.
## [1] TRUE

We can also use rlang::quo_is_missing, which returns TRUE if the quosure is empty (in this case, when nothing lies in the place of ...):

fun_select_NSE_default_2 <- function(df, ...) {
      
   dots <- enquos(...)

   if (all(purrr::map_lgl(dots, rlang::quo_is_missing))) {
      # Default value: year
      select_var_true <- quo(year)
   } else {
      # What is passed if dots are not empty
      select_var_true <- dots
   }
   
   df %>% select(!!!select_var_true)
}
identical(storms_ym, storms %>% fun_select_NSE_default_2(year, month))
## [1] TRUE
identical(storms_y, storms %>% fun_select_NSE_default_2())
## [1] TRUE

Note that purrr::map is necessary, since we are dealing with multiple comma-separated parameters (captured by the dots). rlang::quo_is_missing works only with a single quosure, dots is a list of quosures!

For single parameters, one can also skip ... and use directly a variable which holds the quoted arguments:

fun_select_NSE_default_3 <- function(df, select_var) {
      
   select_var_true <- enquo(select_var)

   if (rlang::quo_is_missing(select_var_true)) {
      # Default value: year, month
      select_var_true <- quo(one_of(c('year', 'month')))
   }
   df %>% select(!!select_var_true)
}
identical(storms_ym, storms %>% fun_select_NSE_default_3())
## [1] TRUE
identical(storms_ym, storms %>% fun_select_NSE_default_3(one_of(c('year', 'month'))))
## [1] TRUE

We can also set default parameters which are more complex than a single variable name, e.g., a select helper split in two arguments:

fun_select_NSE_default_4 <- function(df, ...) {
      
   dots <- enquos(...)

   if (all(map_lgl(dots, rlang::quo_is_missing))) {
      # Default value: year, month
      select_var_true <- quos(one_of(c('year', 'month')), day)
   } else {
      # What is passed if dots are not empty
      select_var_true <- dots
   }
   df %>% select(!!!select_var_true)
}
identical(storms_ymd, storms %>% fun_select_NSE_default_4(one_of(c('year', 'month')), day))
## [1] TRUE
identical(storms_ymd, storms %>% fun_select_NSE_default_4())
## [1] TRUE

Notice the usage of quos instead of quo: there we are quoting two parameters, not one!

Thanks for reading!


  1. parameter: what is declared, argument: what is actually passed.

comments powered by Disqus