Use `tidyselect::eval_select(allow_positional = FALSE)` for `id_cols` when possible #1541

bpcolman · 2024-02-15T00:54:45Z

In going through some old code that I hadn't touched in a while, I found that pivot_wider was not working and was giving me the following error:

#> Error in `rethrow_id_cols_oob()`:
#> ! `i` must be a single string, not an integer vector.
#> ℹ This is an internal error that was detected in the tidyr package.
#>   Please report it at <https://github.com/tidyverse/tidyr/issues> with a reprex
#>   (<https://tidyverse.org/help/>) and the full backtrace.

Digging deeper, I think I've sorted out the overarching timing of the issue, and it is due to changes made between tidyr 1.1.4 and 1.2.0 that changes the behavior, and between versions 1.2.1 and 1.3.0 where the error message changed to the above. I've included a reprex below that shows behavior in ver 1.1.4 and 1.3.1:

# Install `dplyr` and `devtools` if they aren't already:
list.of.packages <- c("dplyr", "devtools")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

# Load dplyr and devtools packages
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(devtools)
#> Loading required package: usethis


# Record installed version of tidyr if there is one, to allow reinstallation of
# the current version, or provide an error if there is not one:
tidy_ver <- packageVersion("tidyr") # record the version number of existing tidyr

detach("package:tidyr", unload = TRUE)
#> Error in detach("package:tidyr", unload = TRUE): invalid 'name' argument

remove.packages("tidyr")
#> Removing package from '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
#> (as 'lib' is unspecified)

install_version("tidyr", 
                version = "1.1.4", 
                repos = "http://cran.us.r-project.org",
                quiet = TRUE)

library(tidyr)

# Set a seed for consistency
set_seed = 312

# Make some data (albeit with not the tidiest structure)

(df <- data.frame(site = c(rep((1:3), each = 2)),
                 time = rep(c("day", "night"), 3),
                 temp = rnorm(6, rep(c(20, 10), each = 1), 4),
                 day_of_exp = c(rep(1, 2), 
                                rep(11, 2),
                                rep(21, 2))))
#>   site  time      temp day_of_exp
#> 1    1   day 24.681337          1
#> 2    1 night  2.646830          1
#> 3    2   day 19.473059         11
#> 4    2 night 10.124520         11
#> 5    3   day 23.304546         21
#> 6    3 night  7.414085         21

# Pivot data based on site (column 1) and day_of_exp (column 4) using column
# numbers:

df %>% pivot_wider(id_cols = c(1, 4),
                   names_from = time,
                   values_from = temp)
#> # A tibble: 3 × 4
#>    site day_of_exp   day night
#>   <int>      <dbl> <dbl> <dbl>
#> 1     1          1  24.7  2.65
#> 2     2         11  19.5 10.1 
#> 3     3         21  23.3  7.41

# Works well. We can also Pivot data based on site and day_of_exp using
# column names:

df %>% pivot_wider(id_cols = c(site, day_of_exp),
                   names_from = time,
                   values_from = temp)
#> # A tibble: 3 × 4
#>    site day_of_exp   day night
#>   <int>      <dbl> <dbl> <dbl>
#> 1     1          1  24.7  2.65
#> 2     2         11  19.5 10.1 
#> 3     3         21  23.3  7.41

# If for some reason, we include a variable that doesn't exist in the dataset,
# we get a simple error message:

df %>%
  pivot_wider(id_cols = c(1, 5),
              names_from = time,
              values_from = temp)
#> Error in `pivot_wider_spec()`:
#> ! Can't subset columns past the end.
#> ℹ Location 5 doesn't exist.
#> ℹ There are only 4 columns.

# So that gives an error message, and the error message makes it clear that the
# error is caused by choosing a column that doesn't exist. 

# Working with `tidyr` v. 1.3.1, things don't work in the same way.

# Detach and uninstall `tidyr`1.1.3 and install and load 1.3.1:

detach("package:tidyr", unload = TRUE)
remove.packages("tidyr")
#> Removing package from '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
#> (as 'lib' is unspecified)

install_version("tidyr", 
                version = "1.3.1", 
                repos = "http://cran.us.r-project.org", 
                quiet = TRUE)

library("tidyr")

# In ver 1.3.1, se can try to pivot data wider based on site (column 1) and
# day_of_exp (column 4) using column numbers:

df %>% pivot_wider(id_cols = c(1, 4),
                   names_from = time,
                   values_from = temp)
#> Error in `rethrow_id_cols_oob()`:
#> ! `i` must be a single string, not an integer vector.
#> ℹ This is an internal error that was detected in the tidyr package.
#>   Please report it at <https://github.com/tidyverse/tidyr/issues> with a reprex
#>   (<https://tidyverse.org/help/>) and the full backtrace.

# It does not work, and we get a new error message. We can still use names:

df %>% pivot_wider(id_cols = c(site, day_of_exp),
                   names_from = time,
                   values_from = temp)
#> # A tibble: 3 × 4
#>    site day_of_exp   day night
#>   <int>      <dbl> <dbl> <dbl>
#> 1     1          1  24.7  2.65
#> 2     2         11  19.5 10.1 
#> 3     3         21  23.3  7.41

# If we include a variable that doesn't exist in the dataset, we get the same
# error message as when we tried to call the 4th column:

df %>%
  pivot_wider(id_cols = c(1, 5),
              names_from = time,
              values_from = temp)
#> Error in `rethrow_id_cols_oob()`:
#> ! `i` must be a single string, not an integer vector.
#> ℹ This is an internal error that was detected in the tidyr package.
#>   Please report it at <https://github.com/tidyverse/tidyr/issues> with a reprex
#>   (<https://tidyverse.org/help/>) and the full backtrace.

# It appears that the cause of this error when calling "id_cols" by number is
# that pivot_wider calls the columns from an object where the columns that are
# called in the "names_from" or "values_from" arguments have already been
# removed, leaving a object in this case that that has dimensions 6 x 2 instead
# of 6 x 4. This is corroborated if we call columns 1 and 2, which should call
# our site and day_of_exp variables if time and temp are removed:

df %>%
  pivot_wider(id_cols = c(1, 2),
              names_from = time,
              values_from = temp)
#> # A tibble: 3 × 4
#>    site day_of_exp   day night
#>   <int>      <dbl> <dbl> <dbl>
#> 1     1          1  24.7  2.65
#> 2     2         11  19.5 10.1 
#> 3     3         21  23.3  7.41

# This works, suggesting that this is where the error lies. 

# Detach library, remove tidyr, and reinstall original version if it was already
# installed:

detach("package:tidyr", unload = TRUE)

remove.packages("tidyr")
#> Removing package from '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
#> (as 'lib' is unspecified)

install_version("tidyr", 
                version = tidy_ver, 
                repos = "http://cran.us.r-project.org",
                quiet = TRUE)

^{Created on 2024-02-15 with reprex v2.1.0}

A workaround is to use the column names in the id_cols argument; however, given older versions of the pivot_wider function allowed the use of column numbers and there is an increased potential for error when entering names, it would be nice if pivot_wider functioned like it did in tidyr 1.1.4 and before. I would also advocate for a return to the more descriptive error message when choosing variables that don't exist if that is possible, and including a descriptive warning/error if one attempts to choose a variable that is included in the names_from or values_from arguments.

If the removal of straightforward selection by column number is a feature and not a bug, I would request an update for the help and vignette to make this even clearer.

The text was updated successfully, but these errors were encountered:

rion-saeon · 2024-04-11T13:04:01Z

In my case, I could specify id_cols like this and it worked. All indexes (col numbers) did not (i.e., when month_n is specified as 8)

pivot_wider(id_cols = 
                c(1:4, month_n), 
              names_from = "records_or_cards", 
              values_from = value)

DavisVaughan · 2024-10-24T14:56:13Z

Minimal reprex

library(tidyr)

# Set a seed for consistency
set_seed = 312

df <- data.frame(
  site = c(rep((1:3), each = 2)),
  time = rep(c("day", "night"), 3),
  temp = rnorm(6, rep(c(20, 10), each = 1), 4),
  day_of_exp = c(rep(1, 2), rep(11, 2), rep(21, 2))
)

# Columns 2 (time) and 3 (temp) are removed before evaluating the tidyselection
# for `id_cols`
df %>% pivot_wider(
  id_cols = c(1, 4),
  names_from = time,
  values_from = temp
)
#> Error in `rethrow_id_cols_oob()`:
#> ! `i` must be a single string, not an integer vector.
#> ℹ This is an internal error that was detected in the tidyr package.
#>   Please report it at <https://github.com/tidyverse/tidyr/issues> with a reprex
#>   (<https://tidyverse.org/help/>) and the full backtrace.

Comes from:

I think it is technically possible that we could evaluate the id_cols tidyselection in the context of the full data frame, and then do a post-hoc check that it doesn't collide with any names specified by names_from or values_from (which is why we implement it this way in the first place). Actually this would not work, because then everything() would also cause the error to get thrown, and we don't want that.

But I'd argue that using positions to perform the selection here is actually not a great thing to do. You should really prefer selecting with names. Ideally I think we'd keep the current behavior but prevent selecting by position through eval_select()

We've decided to wait for eval_select(allow_positional = FALSE) and use that for id_cols when we can to throw an informative error in this case and force the user to use names.

rion-saeon · 2024-10-25T10:04:08Z

@DavisVaughan the reason I want to use positions is that it seems more parsimonious and for the lazy coder like me. However, using indices are more risky than specifying names.

DavisVaughan mentioned this issue Oct 24, 2024

Option for eval_select(allow_positional = FALSE) r-lib/tidyselect#363

Open

DavisVaughan changed the title ~~Change in pivot_wider behavior and error messages when using column numbers for id_cols~~ Use tidyselect::eval_select(allow_positional = FALSE) for id_cols when possible Oct 24, 2024

DavisVaughan added feature a feature request or enhancement pivoting ♻️ pivot rectangular data to different "shapes" labels Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `tidyselect::eval_select(allow_positional = FALSE)` for `id_cols` when possible #1541

Use `tidyselect::eval_select(allow_positional = FALSE)` for `id_cols` when possible #1541

bpcolman commented Feb 15, 2024 •

edited

Loading

rion-saeon commented Apr 11, 2024 •

edited

Loading

DavisVaughan commented Oct 24, 2024 •

edited

Loading

rion-saeon commented Oct 25, 2024 •

edited

Loading

Use tidyselect::eval_select(allow_positional = FALSE) for id_cols when possible #1541

Use tidyselect::eval_select(allow_positional = FALSE) for id_cols when possible #1541

Comments

bpcolman commented Feb 15, 2024 • edited Loading

rion-saeon commented Apr 11, 2024 • edited Loading

DavisVaughan commented Oct 24, 2024 • edited Loading

rion-saeon commented Oct 25, 2024 • edited Loading

Use `tidyselect::eval_select(allow_positional = FALSE)` for `id_cols` when possible #1541

Use `tidyselect::eval_select(allow_positional = FALSE)` for `id_cols` when possible #1541

bpcolman commented Feb 15, 2024 •

edited

Loading

rion-saeon commented Apr 11, 2024 •

edited

Loading

DavisVaughan commented Oct 24, 2024 •

edited

Loading

rion-saeon commented Oct 25, 2024 •

edited

Loading