Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tidyselect::eval_select(allow_positional = FALSE) for id_cols when possible #1541

Open
bpcolman opened this issue Feb 15, 2024 · 3 comments
Labels
feature a feature request or enhancement pivoting ♻️ pivot rectangular data to different "shapes"

Comments

@bpcolman
Copy link

bpcolman commented Feb 15, 2024

In going through some old code that I hadn't touched in a while, I found that pivot_wider was not working and was giving me the following error:

#> Error in `rethrow_id_cols_oob()`:
#> ! `i` must be a single string, not an integer vector.
#> ℹ This is an internal error that was detected in the tidyr package.
#>   Please report it at <https://github.com/tidyverse/tidyr/issues> with a reprex
#>   (<https://tidyverse.org/help/>) and the full backtrace.

Digging deeper, I think I've sorted out the overarching timing of the issue, and it is due to changes made between tidyr 1.1.4 and 1.2.0 that changes the behavior, and between versions 1.2.1 and 1.3.0 where the error message changed to the above. I've included a reprex below that shows behavior in ver 1.1.4 and 1.3.1:

# Install `dplyr` and `devtools` if they aren't already:
list.of.packages <- c("dplyr", "devtools")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

# Load dplyr and devtools packages
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(devtools)
#> Loading required package: usethis


# Record installed version of tidyr if there is one, to allow reinstallation of
# the current version, or provide an error if there is not one:
tidy_ver <- packageVersion("tidyr") # record the version number of existing tidyr

detach("package:tidyr", unload = TRUE)
#> Error in detach("package:tidyr", unload = TRUE): invalid 'name' argument

remove.packages("tidyr")
#> Removing package from '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
#> (as 'lib' is unspecified)

install_version("tidyr", 
                version = "1.1.4", 
                repos = "http://cran.us.r-project.org",
                quiet = TRUE)

library(tidyr)

# Set a seed for consistency
set_seed = 312

# Make some data (albeit with not the tidiest structure)

(df <- data.frame(site = c(rep((1:3), each = 2)),
                 time = rep(c("day", "night"), 3),
                 temp = rnorm(6, rep(c(20, 10), each = 1), 4),
                 day_of_exp = c(rep(1, 2), 
                                rep(11, 2),
                                rep(21, 2))))
#>   site  time      temp day_of_exp
#> 1    1   day 24.681337          1
#> 2    1 night  2.646830          1
#> 3    2   day 19.473059         11
#> 4    2 night 10.124520         11
#> 5    3   day 23.304546         21
#> 6    3 night  7.414085         21

# Pivot data based on site (column 1) and day_of_exp (column 4) using column
# numbers:

df %>% pivot_wider(id_cols = c(1, 4),
                   names_from = time,
                   values_from = temp)
#> # A tibble: 3 × 4
#>    site day_of_exp   day night
#>   <int>      <dbl> <dbl> <dbl>
#> 1     1          1  24.7  2.65
#> 2     2         11  19.5 10.1 
#> 3     3         21  23.3  7.41

# Works well. We can also Pivot data based on site and day_of_exp using
# column names:

df %>% pivot_wider(id_cols = c(site, day_of_exp),
                   names_from = time,
                   values_from = temp)
#> # A tibble: 3 × 4
#>    site day_of_exp   day night
#>   <int>      <dbl> <dbl> <dbl>
#> 1     1          1  24.7  2.65
#> 2     2         11  19.5 10.1 
#> 3     3         21  23.3  7.41

# If for some reason, we include a variable that doesn't exist in the dataset,
# we get a simple error message:

df %>%
  pivot_wider(id_cols = c(1, 5),
              names_from = time,
              values_from = temp)
#> Error in `pivot_wider_spec()`:
#> ! Can't subset columns past the end.
#> ℹ Location 5 doesn't exist.
#> ℹ There are only 4 columns.

# So that gives an error message, and the error message makes it clear that the
# error is caused by choosing a column that doesn't exist. 

# Working with `tidyr` v. 1.3.1, things don't work in the same way.

# Detach and uninstall `tidyr`1.1.3 and install and load 1.3.1:

detach("package:tidyr", unload = TRUE)
remove.packages("tidyr")
#> Removing package from '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
#> (as 'lib' is unspecified)

install_version("tidyr", 
                version = "1.3.1", 
                repos = "http://cran.us.r-project.org", 
                quiet = TRUE)

library("tidyr")

# In ver 1.3.1, se can try to pivot data wider based on site (column 1) and
# day_of_exp (column 4) using column numbers:

df %>% pivot_wider(id_cols = c(1, 4),
                   names_from = time,
                   values_from = temp)
#> Error in `rethrow_id_cols_oob()`:
#> ! `i` must be a single string, not an integer vector.
#> ℹ This is an internal error that was detected in the tidyr package.
#>   Please report it at <https://github.com/tidyverse/tidyr/issues> with a reprex
#>   (<https://tidyverse.org/help/>) and the full backtrace.

# It does not work, and we get a new error message. We can still use names:

df %>% pivot_wider(id_cols = c(site, day_of_exp),
                   names_from = time,
                   values_from = temp)
#> # A tibble: 3 × 4
#>    site day_of_exp   day night
#>   <int>      <dbl> <dbl> <dbl>
#> 1     1          1  24.7  2.65
#> 2     2         11  19.5 10.1 
#> 3     3         21  23.3  7.41

# If we include a variable that doesn't exist in the dataset, we get the same
# error message as when we tried to call the 4th column:

df %>%
  pivot_wider(id_cols = c(1, 5),
              names_from = time,
              values_from = temp)
#> Error in `rethrow_id_cols_oob()`:
#> ! `i` must be a single string, not an integer vector.
#> ℹ This is an internal error that was detected in the tidyr package.
#>   Please report it at <https://github.com/tidyverse/tidyr/issues> with a reprex
#>   (<https://tidyverse.org/help/>) and the full backtrace.

# It appears that the cause of this error when calling "id_cols" by number is
# that pivot_wider calls the columns from an object where the columns that are
# called in the "names_from" or "values_from" arguments have already been
# removed, leaving a object in this case that that has dimensions 6 x 2 instead
# of 6 x 4. This is corroborated if we call columns 1 and 2, which should call
# our site and day_of_exp variables if time and temp are removed:

df %>%
  pivot_wider(id_cols = c(1, 2),
              names_from = time,
              values_from = temp)
#> # A tibble: 3 × 4
#>    site day_of_exp   day night
#>   <int>      <dbl> <dbl> <dbl>
#> 1     1          1  24.7  2.65
#> 2     2         11  19.5 10.1 
#> 3     3         21  23.3  7.41

# This works, suggesting that this is where the error lies. 

# Detach library, remove tidyr, and reinstall original version if it was already
# installed:

detach("package:tidyr", unload = TRUE)

remove.packages("tidyr")
#> Removing package from '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
#> (as 'lib' is unspecified)

install_version("tidyr", 
                version = tidy_ver, 
                repos = "http://cran.us.r-project.org",
                quiet = TRUE)

Created on 2024-02-15 with reprex v2.1.0

A workaround is to use the column names in the id_cols argument; however, given older versions of the pivot_wider function allowed the use of column numbers and there is an increased potential for error when entering names, it would be nice if pivot_wider functioned like it did in tidyr 1.1.4 and before. I would also advocate for a return to the more descriptive error message when choosing variables that don't exist if that is possible, and including a descriptive warning/error if one attempts to choose a variable that is included in the names_from or values_from arguments.

If the removal of straightforward selection by column number is a feature and not a bug, I would request an update for the help and vignette to make this even clearer.

@rion-saeon
Copy link

rion-saeon commented Apr 11, 2024

In my case, I could specify id_cols like this and it worked. All indexes (col numbers) did not (i.e., when month_n is specified as 8)

pivot_wider(id_cols = 
                c(1:4, month_n), 
              names_from = "records_or_cards", 
              values_from = value)

@DavisVaughan
Copy link
Member

DavisVaughan commented Oct 24, 2024

Minimal reprex

library(tidyr)

# Set a seed for consistency
set_seed = 312

df <- data.frame(
  site = c(rep((1:3), each = 2)),
  time = rep(c("day", "night"), 3),
  temp = rnorm(6, rep(c(20, 10), each = 1), 4),
  day_of_exp = c(rep(1, 2), rep(11, 2), rep(21, 2))
)

# Columns 2 (time) and 3 (temp) are removed before evaluating the tidyselection
# for `id_cols`
df %>% pivot_wider(
  id_cols = c(1, 4),
  names_from = time,
  values_from = temp
)
#> Error in `rethrow_id_cols_oob()`:
#> ! `i` must be a single string, not an integer vector.
#> ℹ This is an internal error that was detected in the tidyr package.
#>   Please report it at <https://github.com/tidyverse/tidyr/issues> with a reprex
#>   (<https://tidyverse.org/help/>) and the full backtrace.

Comes from:

I think it is technically possible that we could evaluate the id_cols tidyselection in the context of the full data frame, and then do a post-hoc check that it doesn't collide with any names specified by names_from or values_from (which is why we implement it this way in the first place). Actually this would not work, because then everything() would also cause the error to get thrown, and we don't want that.

But I'd argue that using positions to perform the selection here is actually not a great thing to do. You should really prefer selecting with names. Ideally I think we'd keep the current behavior but prevent selecting by position through eval_select()


We've decided to wait for eval_select(allow_positional = FALSE) and use that for id_cols when we can to throw an informative error in this case and force the user to use names.

@DavisVaughan DavisVaughan changed the title Change in pivot_wider behavior and error messages when using column numbers for id_cols Use tidyselect::eval_select(allow_positional = FALSE) for id_cols when possible Oct 24, 2024
@DavisVaughan DavisVaughan added feature a feature request or enhancement pivoting ♻️ pivot rectangular data to different "shapes" labels Oct 24, 2024
@rion-saeon
Copy link

rion-saeon commented Oct 25, 2024

@DavisVaughan the reason I want to use positions is that it seems more parsimonious and for the lazy coder like me. However, using indices are more risky than specifying names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement pivoting ♻️ pivot rectangular data to different "shapes"
Projects
None yet
Development

No branches or pull requests

3 participants