Skip to content

Commit

Permalink
Add .by to fill() (#1572)
Browse files Browse the repository at this point in the history
* Add .by to `fill()`

* Add news

* Add it back to the generic

* Add extra validation when `.by` is supplied and a group data frame (repeat the error message from mutate() for a better stack trace.

* Move `.by` to the data frame method, remove `.by` handling, misc tweaks

* Go back to `.by` in the generic of `fill()`

---------

Co-authored-by: Davis Vaughan <[email protected]>
  • Loading branch information
olivroy and DavisVaughan authored Aug 28, 2024
1 parent 454a4c8 commit 622afe8
Show file tree
Hide file tree
Showing 5 changed files with 118 additions and 18 deletions.
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# tidyr (development version)

* `fill()` gains a `.by` argument as an alternative to `dplyr::group_by()` for
applying the fill per group, similar to `nest(.by =)` and
`dplyr::mutate(.by =)` (@olivroy, #1439).

* `unchop()` produces a more helpful error message when columns cannot be cast
to `ptype` (@mgirlich, #1477).

Expand Down
40 changes: 31 additions & 9 deletions R/fill.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,17 @@
#' @section Grouped data frames:
#' With grouped data frames created by [dplyr::group_by()], `fill()` will be
#' applied _within_ each group, meaning that it won't fill across group
#' boundaries.
#' boundaries. This can also be accomplished using the `.by` argument to
#' `fill()`, which creates a temporary grouping for just this operation.
#'
#' @inheritParams dplyr::mutate
#'
#' @param data A data frame.
#' @param ... <[`tidy-select`][tidyr_tidy_select]> Columns to fill.
#' @param .direction Direction in which to fill missing values. Currently
#' either "down" (the default), "up", "downup" (i.e. first down and then up)
#' or "updown" (first up and then down).
#'
#' @export
#' @examples
#' # direction = "down" --------------------------------------------------------
Expand Down Expand Up @@ -82,22 +86,36 @@
#' 3, "Danielle", "Observer", NA
#' )
#'
#' # The values are inconsistently missing by position within the group
#' # Use .direction = "downup" to fill missing values in both directions
#' # The values are inconsistently missing by position within the `group`.
#' # Use `.direction = "downup"` to fill missing values in both directions
#' # and `.by = group` to apply the fill per group.
#' squirrels %>%
#' fill(n_squirrels, .direction = "downup", .by = group)
#'
#' # If you want, you can also supply a data frame grouped with `group_by()`,
#' # but don't forget to `ungroup()`!
#' squirrels %>%
#' dplyr::group_by(group) %>%
#' fill(n_squirrels, .direction = "downup") %>%
#' dplyr::ungroup()
#'
#' # Using `.direction = "updown"` accomplishes the same goal in this example
fill <- function(data, ..., .direction = c("down", "up", "downup", "updown")) {
fill <- function(data,
...,
.by = NULL,
.direction = c("down", "up", "downup", "updown")) {
check_dots_unnamed()
UseMethod("fill")
}

#' @export
fill.data.frame <- function(data, ..., .direction = c("down", "up", "downup", "updown")) {
vars <- tidyselect::eval_select(expr(c(...)), data, allow_rename = FALSE)
fill.data.frame <- function(data,
...,
.by = NULL,
.direction = c("down", "up", "downup", "updown")) {
vars <- names(tidyselect::eval_select(
expr = expr(c(...)),
data = data,
allow_rename = FALSE
))

.direction <- arg_match0(
arg = .direction,
Expand All @@ -108,5 +126,9 @@ fill.data.frame <- function(data, ..., .direction = c("down", "up", "downup", "u
vec_fill_missing(col, direction = .direction)
}

dplyr::mutate_at(data, .vars = dplyr::vars(any_of(vars)), .funs = fn)
dplyr::mutate(
data,
dplyr::across(any_of(vars), .fns = fn),
.by = {{ .by }}
)
}
23 changes: 17 additions & 6 deletions man/fill.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 27 additions & 0 deletions tests/testthat/_snaps/fill.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
# errors on named `...` inputs

Code
fill(df, fooy = x)
Condition
Error in `fill()`:
! Arguments in `...` must be passed by position, not name.
x Problematic argument:
* fooy = x

# validates its inputs

Code
Expand All @@ -6,3 +16,20 @@
Error in `fill()`:
! `.direction` must be one of "down", "up", "downup", or "updown", not "foo".

# `.by` can't select columns that don't exist

Code
fill(df, y, .by = z)
Condition
Error in `dplyr::mutate()`:
! Can't select columns that don't exist.
x Column `z` doesn't exist.

# `.by` can't be used on a grouped data frame

Code
fill(df, y, .by = x)
Condition
Error in `dplyr::mutate()`:
! Can't supply `.by` when `.data` is a grouped data frame.

42 changes: 39 additions & 3 deletions tests/testthat/test-fill.R
Original file line number Diff line number Diff line change
Expand Up @@ -106,23 +106,59 @@ test_that("fill preserves attributes", {
expect_equal(attributes(out_u$x), attributes(df$x))
})

test_that("fill respects grouping", {
test_that("fill respects existing grouping and `.by`", {
df <- tibble(x = c(1, 1, 2), y = c(1, NA, NA))

out <- df %>%
dplyr::group_by(x) %>%
fill(y)
expect_equal(out$y, c(1, 1, NA))
expect_identical(out$y, c(1, 1, NA))
expect_identical(dplyr::group_vars(out), "x")

out <- df %>%
fill(y, .by = x)
expect_identical(out$y, c(1, 1, NA))
expect_identical(dplyr::group_vars(out), character())
})

test_that("works when there is a column named `.direction` in the data (#1319)", {
df <- tibble(x = c(1, NA, 2), .direction = 1:3)
expect_error(out <- fill(df, x), NA)
out <- fill(df, x)
expect_identical(out$x, c(1, 1, 2))
})

test_that("errors on named `...` inputs", {
df <- tibble(x = c(1, NA, 2))

expect_snapshot(error = TRUE, {
fill(df, fooy = x)
})
})

test_that("validates its inputs", {
df <- tibble(x = c(1, NA, 2))
expect_snapshot(error = TRUE, {
df %>% fill(x, .direction = "foo")
})
})

test_that("`.by` can't select columns that don't exist", {
df <- tibble(x = 1, y = 2)

# This shows `mutate()` in the error, but we accept that to not have to handle
# `.by` in any way.
expect_snapshot(error = TRUE, {
fill(df, y, .by = z)
})
})

test_that("`.by` can't be used on a grouped data frame", {
df <- tibble(x = 1, y = 2)
df <- dplyr::group_by(df, x)

# This shows `mutate()` in the error, but we accept that to not have to handle
# `.by` in any way.
expect_snapshot(error = TRUE, {
fill(df, y, .by = x)
})
})

0 comments on commit 622afe8

Please sign in to comment.