From 4edfa2f4bde7626bc42cbb138024d8ec07af5d6d Mon Sep 17 00:00:00 2001 From: DavisVaughan Date: Thu, 15 Aug 2024 21:53:48 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20tidyvers?= =?UTF-8?q?e/tidyr@9966c049541a708605b9c2d109251f4b14355c00=20=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- dev/news/index.html | 3 ++- dev/pkgdown.yml | 2 +- dev/search.json | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/dev/news/index.html b/dev/news/index.html index f30a3cc8..43178910 100644 --- a/dev/news/index.html +++ b/dev/news/index.html @@ -59,7 +59,8 @@

tidyr (development version)

-

tidyr 1.3.1

CRAN release: 2024-01-24

diff --git a/dev/pkgdown.yml b/dev/pkgdown.yml index 80e95842..5560fc69 100644 --- a/dev/pkgdown.yml +++ b/dev/pkgdown.yml @@ -8,7 +8,7 @@ articles: programming: programming.html rectangle: rectangle.html tidy-data: tidy-data.html -last_built: 2024-08-15T21:19Z +last_built: 2024-08-15T21:52Z urls: reference: https://tidyr.tidyverse.org/reference article: https://tidyr.tidyverse.org/articles diff --git a/dev/search.json b/dev/search.json index 5850b1f3..7ffa206b 100644 --- a/dev/search.json +++ b/dev/search.json @@ -1 +1 @@ -[{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"our-pledge","dir":"","previous_headings":"","what":"Our Pledge","title":"Contributor Covenant Code of Conduct","text":"members, contributors, leaders pledge make participation community harassment-free experience everyone, regardless age, body size, visible invisible disability, ethnicity, sex characteristics, gender identity expression, level experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, sexual identity orientation. pledge act interact ways contribute open, welcoming, diverse, inclusive, healthy community.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"our-standards","dir":"","previous_headings":"","what":"Our Standards","title":"Contributor Covenant Code of Conduct","text":"Examples behavior contributes positive environment community include: Demonstrating empathy kindness toward people respectful differing opinions, viewpoints, experiences Giving gracefully accepting constructive feedback Accepting responsibility apologizing affected mistakes, learning experience Focusing best just us individuals, overall community Examples unacceptable behavior include: use sexualized language imagery, sexual attention advances kind Trolling, insulting derogatory comments, personal political attacks Public private harassment Publishing others’ private information, physical email address, without explicit permission conduct reasonably considered inappropriate professional setting","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-responsibilities","dir":"","previous_headings":"","what":"Enforcement Responsibilities","title":"Contributor Covenant Code of Conduct","text":"Community leaders responsible clarifying enforcing standards acceptable behavior take appropriate fair corrective action response behavior deem inappropriate, threatening, offensive, harmful. Community leaders right responsibility remove, edit, reject comments, commits, code, wiki edits, issues, contributions aligned Code Conduct, communicate reasons moderation decisions appropriate.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"scope","dir":"","previous_headings":"","what":"Scope","title":"Contributor Covenant Code of Conduct","text":"Code Conduct applies within community spaces, also applies individual officially representing community public spaces. Examples representing community include using official e-mail address, posting via official social media account, acting appointed representative online offline event.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"enforcement","dir":"","previous_headings":"","what":"Enforcement","title":"Contributor Covenant Code of Conduct","text":"Instances abusive, harassing, otherwise unacceptable behavior may reported community leaders responsible enforcement codeofconduct@posit.co. complaints reviewed investigated promptly fairly. community leaders obligated respect privacy security reporter incident.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-guidelines","dir":"","previous_headings":"","what":"Enforcement Guidelines","title":"Contributor Covenant Code of Conduct","text":"Community leaders follow Community Impact Guidelines determining consequences action deem violation Code Conduct:","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_1-correction","dir":"","previous_headings":"Enforcement Guidelines","what":"1. Correction","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Use inappropriate language behavior deemed unprofessional unwelcome community. Consequence: private, written warning community leaders, providing clarity around nature violation explanation behavior inappropriate. public apology may requested.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_2-warning","dir":"","previous_headings":"Enforcement Guidelines","what":"2. Warning","title":"Contributor Covenant Code of Conduct","text":"Community Impact: violation single incident series actions. Consequence: warning consequences continued behavior. interaction people involved, including unsolicited interaction enforcing Code Conduct, specified period time. includes avoiding interactions community spaces well external channels like social media. Violating terms may lead temporary permanent ban.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_3-temporary-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"3. Temporary Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: serious violation community standards, including sustained inappropriate behavior. Consequence: temporary ban sort interaction public communication community specified period time. public private interaction people involved, including unsolicited interaction enforcing Code Conduct, allowed period. Violating terms may lead permanent ban.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_4-permanent-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"4. Permanent Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Demonstrating pattern violation community standards, including sustained inappropriate behavior, harassment individual, aggression toward disparagement classes individuals. Consequence: permanent ban sort public interaction within community.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"attribution","dir":"","previous_headings":"","what":"Attribution","title":"Contributor Covenant Code of Conduct","text":"Code Conduct adapted Contributor Covenant, version 2.1, available https://www.contributor-covenant.org/version/2/1/code_of_conduct.html. Community Impact Guidelines inspired [Mozilla’s code conduct enforcement ladder][https://github.com/mozilla/inclusion]. answers common questions code conduct, see FAQ https://www.contributor-covenant.org/faq. Translations available https://www.contributor-covenant.org/translations.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CONTRIBUTING.html","id":null,"dir":"","previous_headings":"","what":"Contributing to tidyr","title":"Contributing to tidyr","text":"outlines propose change tidyr. detailed info contributing , tidyverse packages, please see development contributing guide.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CONTRIBUTING.html","id":"fixing-typos","dir":"","previous_headings":"","what":"Fixing typos","title":"Contributing to tidyr","text":"Small typos grammatical errors documentation may edited directly using GitHub web interface, long changes made source file. YES: edit roxygen comment .R file R/. : edit .Rd file man/.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CONTRIBUTING.html","id":"prerequisites","dir":"","previous_headings":"","what":"Prerequisites","title":"Contributing to tidyr","text":"make substantial pull request, always file issue make sure someone team agrees ’s problem. ’ve found bug, create associated issue illustrate bug minimal reprex.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CONTRIBUTING.html","id":"pull-request-process","dir":"","previous_headings":"","what":"Pull request process","title":"Contributing to tidyr","text":"recommend create Git branch pull request (PR). Look Travis AppVeyor build status making changes. README contain badges continuous integration services used package. New code follow tidyverse style guide. can use styler package apply styles, please don’t restyle code nothing PR. use roxygen2, Markdown syntax, documentation. use testthat. Contributions test cases included easier accept. user-facing changes, add bullet top NEWS.md current development version header describing changes made followed GitHub username, links relevant issue(s)/PR(s).","code":""},{"path":"https://tidyr.tidyverse.org/dev/CONTRIBUTING.html","id":"code-of-conduct","dir":"","previous_headings":"","what":"Code of Conduct","title":"Contributing to tidyr","text":"Please note tidyr project released Contributor Code Conduct. contributing project agree abide terms.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CONTRIBUTING.html","id":"see-tidyverse-development-contributing-guide","dir":"","previous_headings":"","what":"See tidyverse development contributing guide","title":"Contributing to tidyr","text":"details.","code":""},{"path":"https://tidyr.tidyverse.org/dev/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2023 tidyr authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://tidyr.tidyverse.org/dev/SUPPORT.html","id":null,"dir":"","previous_headings":"","what":"Getting help with tidyr","title":"Getting help with tidyr","text":"Thanks using tidyr. filing issue, places explore pieces put together make process smooth possible. Start making minimal reproducible example using reprex package. haven’t heard used reprex , ’re treat! Seriously, reprex make R-question-asking endeavors easier (pretty insane ROI five ten minutes ’ll take learn ’s ). additional reprex pointers, check Get help! section tidyverse site. Armed reprex, next step figure ask. ’s question: start forum.posit.co, /StackOverflow. people answer questions. ’s bug: ’re right place, file issue. ’re sure: let community help figure ! problem bug feature request, can easily return report . opening new issue, sure search issues pull requests make sure bug hasn’t reported /already fixed development version. default, search pre-populated :issue :open. can edit qualifiers (e.g. :pr, :closed) needed. example, ’d simply remove :open search issues repo, open closed. right place, need file issue, please review “File issues” paragraph tidyverse contributing guidelines. Thanks help!","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"In packages","text":"vignette serves two distinct, related, purposes: documents general best practices using tidyr package, inspired using ggplot2 packages. describes migration patterns transition tidyr v0.8.3 v1.0.0. release includes breaking changes nest() unnest() order increase consistency within tidyr rest tidyverse. go , ’ll attach packages use, expose version tidyr, make small dataset use examples.","code":"library(tidyr) library(dplyr, warn.conflicts = FALSE) library(purrr) packageVersion(\"tidyr\") #> [1] '1.3.1.9000' mini_iris <- as_tibble(iris)[c(1, 2, 51, 52, 101, 102), ] mini_iris #> # A tibble: 6 × 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 7 3.2 4.7 1.4 versicolor #> 4 6.4 3.2 4.5 1.5 versicolor #> 5 6.3 3.3 6 2.5 virginica #> 6 5.8 2.7 5.1 1.9 virginica"},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"using-tidyr-in-packages","dir":"Articles","previous_headings":"","what":"Using tidyr in packages","title":"In packages","text":"assume ’re already familiar using tidyr functions, described vignette(\"programming.Rmd\"). two important considerations using tidyr package: avoid R CMD CHECK notes using fixed variable names. alert upcoming changes development version tidyr.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"fixed-column-names","dir":"Articles","previous_headings":"Using tidyr in packages","what":"Fixed column names","title":"In packages","text":"know column names, code works way regardless whether inside outside package: R CMD check warn undefined global variables (Petal.Length, Petal.Width, Sepal.Length, Sepal.Width), doesn’t know nest() looking variables inside mini_iris (.e. Petal.Length friends data-variables, env-variables). easiest way silence note use all_of(). all_of() tidyselect helper (like starts_with(), ends_with(), etc.) takes column names stored strings: Alternatively, may want use any_of() OK specified variables found input data. tidyselect package offers entire family select helpers. probably already familiar using dplyr::select().","code":"mini_iris %>% nest( petal = c(Petal.Length, Petal.Width), sepal = c(Sepal.Length, Sepal.Width) ) #> # A tibble: 3 × 3 #> Species petal sepal #> #> 1 setosa #> 2 versicolor #> 3 virginica mini_iris %>% nest( petal = all_of(c(\"Petal.Length\", \"Petal.Width\")), sepal = all_of(c(\"Sepal.Length\", \"Sepal.Width\")) ) #> # A tibble: 3 × 3 #> Species petal sepal #> #> 1 setosa #> 2 versicolor #> 3 virginica "},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"continuous-integration","dir":"Articles","previous_headings":"Using tidyr in packages","what":"Continuous integration","title":"In packages","text":"Hopefully ’ve already adopted continuous integration package, R CMD check (includes tests) run regular basis, e.g. every time push changes package’s source GitHub similar. tidyverse team currently relies heavily GitHub Actions, example. usethis::use_github_action() can help get started. recommend adding workflow targets devel version tidyr. ? Always? package tightly coupled tidyr, consider leaving place time, know changes tidyr affect package. Right tidyr release? everyone else, add (re-activate existing) tidyr-devel workflow period preceding major tidyr release potential breaking changes, especially ’ve contacted reverse dependency checks. Example GitHub Actions workflow tests package development version tidyr: GitHub Actions evolving landscape, can always mine workflows tidyr (tidyverse/tidyr/.github/workflows) main r-lib/actions repo ideas.","code":"on: push: branches: - main pull_request: branches: - main name: R-CMD-check-tidyr-devel jobs: R-CMD-check: runs-on: macOS-latest steps: - uses: actions/checkout@v4 - uses: r-lib/actions/setup-r@v2 - name: Install dependencies run: | install.packages(c(\"remotes\", \"rcmdcheck\")) remotes::install_deps(dependencies = TRUE) remotes::install_github(\"tidyverse/tidyr\") shell: Rscript {0} - name: Check run: rcmdcheck::rcmdcheck(args = \"--no-manual\", error_on = \"error\") shell: Rscript {0}"},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"tidyr-v0-8-3---v1-0-0","dir":"Articles","previous_headings":"","what":"tidyr v0.8.3 -> v1.0.0","title":"In packages","text":"v1.0.0 makes considerable changes interface nest() unnest() order bring line newer tidyverse conventions. tried make functions backward compatible possible give informative warning messages, cover 100% use cases, may need change package code. guide help minimum pain. Ideally, ’ll tweak package works tidyr 0.8.3 tidyr 1.0.0. makes life considerably easier means ’s need coordinate CRAN submissions - can submit package works tidyr versions, submit tidyr CRAN. section describes recommend practices , drawing general principles described https://design.tidyverse.org/changes-multivers.html. use continuous integration already, strongly recommend adding build tests development version tidyr; see details. section briefly describes run different code different versions tidyr, goes major changes might require workarounds: nest() unnest() get new interfaces. nest() preserves groups. nest_() unnest_() defunct. ’re struggling problem ’s described , please reach via github email can help .","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"conditional-code","dir":"Articles","previous_headings":"tidyr v0.8.3 -> v1.0.0","what":"Conditional code","title":"In packages","text":"Sometimes ’ll able write code works v0.8.3 v1.0.0. often requires code ’s particularly natural either version ’d better (temporarily) separate code paths, containing non-contrived code. get re-use existing code “old” branch, eventually phased , write clean, forward-looking code “new” branch. basic approach looks like . First define function returns TRUE new versions tidyr: highly recommend keeping function provides obvious place jot transition notes package, makes easier remove transitional code later . Another benefit tidyr version determined run time, build time, therefore detect user’s current tidyr version. functions, use statement call different code different versions: new code uses function exists tidyr 1.0.0, get NOTE R CMD check: one notes can explain CRAN submission comments. Just mention ’s forward compatibility tidyr 1.0.0, CRAN let package .","code":"tidyr_new_interface <- function() { packageVersion(\"tidyr\") > \"0.8.99\" } my_function_inside_a_package <- function(...) # my code here if (tidyr_new_interface()) { # Freshly written code for v1.0.0 out <- tidyr::nest(df, data = any_of(c(\"x\", \"y\", \"z\"))) } else { # Existing code for v0.8.3 out <- tidyr::nest(df, x, y, z) } # more code here }"},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"new-syntax-for-nest","dir":"Articles","previous_headings":"tidyr v0.8.3 -> v1.0.0","what":"New syntax for nest()","title":"In packages","text":"changed: --nested columns longer accepted “loose parts”. new list-column’s name longer provided via .key argument. Now use construct like : new_col = . changed: use ... metadata problematic pattern ’re moving away . https://design.tidyverse.org/dots-data.html new_col = construct lets us create multiple nested list-columns (“multi-nest”). examples: need quick dirty fix without think, just call nest_legacy() instead nest(). ’s nest() v0.8.3:","code":"mini_iris %>% nest(petal = matches(\"Petal\"), sepal = matches(\"Sepal\")) #> # A tibble: 3 × 3 #> Species petal sepal #> #> 1 setosa #> 2 versicolor #> 3 virginica # v0.8.3 mini_iris %>% nest(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, .key = \"my_data\") # v1.0.0 mini_iris %>% nest(my_data = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)) # v1.0.0 avoiding R CMD check NOTE mini_iris %>% nest(my_data = any_of(c(\"Sepal.Length\", \"Sepal.Width\", \"Petal.Length\", \"Petal.Width\"))) # or equivalently: mini_iris %>% nest(my_data = !any_of(\"Species\")) if (tidyr_new_interface()) { out <- tidyr::nest_legacy(df, x, y, z) } else { out <- tidyr::nest(df, x, y, z) }"},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"new-syntax-for-unnest","dir":"Articles","previous_headings":"tidyr v0.8.3 -> v1.0.0","what":"New syntax for unnest()","title":"In packages","text":"changed: --unnested columns must now specified explicitly, instead defaulting list-columns. also deprecates .drop .preserve. .sep deprecated replaced names_sep. unnest() uses emerging tidyverse standard disambiguate duplicated names. Use names_repair = tidyr_legacy request previous approach. .id deprecated can easily replaced creating column names prior unnest(), e.g. upstream call mutate(). changed: use ... metadata problematic pattern ’re moving away . https://design.tidyverse.org/dots-data.html changes details arguments relate features rolling across multiple packages tidyverse. example, ptype exposes prototype support new vctrs package. names_repair specifies duplicated non-syntactic names, consistent tibble readxl. : need quick dirty fix without think, just call unnest_legacy() instead unnest(). ’s unnest() v0.8.3:","code":"# v0.8.3 df %>% unnest(x, .id = \"id\") # v1.0.0 df %>% mutate(id = names(x)) %>% unnest(x)) nested <- mini_iris %>% nest(my_data = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)) # v0.8.3 automatically unnests list-cols nested %>% unnest() # v1.0.0 must be told which columns to unnest nested %>% unnest(any_of(\"my_data\")) if (tidyr_new_interface()) { out <- tidyr::unnest_legacy(df) } else { out <- tidyr::unnest(df) }"},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"nest-preserves-groups","dir":"Articles","previous_headings":"tidyr v0.8.3 -> v1.0.0","what":"nest() preserves groups","title":"In packages","text":"changed: nest() now preserves groups present input. changed: reflect growing support grouped data frames, especially recent releases dplyr. See, example, dplyr::group_modify(), group_map(), friends. fact nest() now preserves groups problematic downstream, choices: Apply ungroup() result. level pragmatism suggests, however, least consider next two options. never grouped first place. Eliminate group_by() call specify columns nested versus nested directly nest(). Adjust downstream code accommodate grouping. Imagine used group_by() nest() mini_iris, computed list-column outside data frame. now try add back data post hoc: fails df grouped mutate() group-aware, ’s hard add completely external variable. pragmatically ungroup()ing, can ? One option work inside data frame, .e. bring map() inside mutate(), design problem away: , somehow, grouping seems appropriate working inside data frame option, tibble::add_column() group-unaware. lets add external data grouped data frame.","code":"(df <- mini_iris %>% group_by(Species) %>% nest()) #> # A tibble: 3 × 2 #> # Groups: Species [3] #> Species data #> #> 1 setosa #> 2 versicolor #> 3 virginica (external_variable <- map_int(df$data, nrow)) #> [1] 2 2 2 df %>% mutate(n_rows = external_variable) #> Error in `mutate()`: #> ℹ In argument: `n_rows = external_variable`. #> ℹ In group 1: `Species = setosa`. #> Caused by error: #> ! `n_rows` must be size 1, not 3. df %>% mutate(n_rows = map_int(data, nrow)) #> # A tibble: 3 × 3 #> # Groups: Species [3] #> Species data n_rows #> #> 1 setosa 2 #> 2 versicolor 2 #> 3 virginica 2 df %>% tibble::add_column(n_rows = external_variable) #> # A tibble: 3 × 3 #> # Groups: Species [3] #> Species data n_rows #> #> 1 setosa 2 #> 2 versicolor 2 #> 3 virginica 2"},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"nest_-and-unnest_-are-defunct","dir":"Articles","previous_headings":"tidyr v0.8.3 -> v1.0.0","what":"nest_() and unnest_() are defunct","title":"In packages","text":"changed: nest_() unnest_() longer work changed: Specialized standard evaluation versions functions, e.g., foo_() complement foo(). older lazyeval framework. :","code":"# v0.8.3 mini_iris %>% nest_( key_col = \"my_data\", nest_cols = c(\"Sepal.Length\", \"Sepal.Width\", \"Petal.Length\", \"Petal.Width\") ) nested %>% unnest_(~ my_data) # v1.0.0 mini_iris %>% nest(my_data = any_of(c(\"Sepal.Length\", \"Sepal.Width\", \"Petal.Length\", \"Petal.Width\"))) nested %>% unnest(any_of(\"my_data\"))"},{"path":"https://tidyr.tidyverse.org/dev/articles/nest.html","id":"basics","dir":"Articles","previous_headings":"","what":"Basics","title":"Nested data","text":"nested data frame data frame one () columns list data frames. can create simple nested data frames hand: (possible create list-columns regular data frames, just tibbles, ’s considerably work default behaviour data.frame() treat lists lists columns.) commonly ’ll create tidyr::nest(): nest() specifies variables nested inside; alternative use dplyr::group_by() describe variables kept outside. think nesting easiest understand connection grouped data: row output corresponds one group input. ’ll see shortly particularly convenient per-group objects. opposite nest() unnest(). give name list-column containing data frames, row-binds data frames together, repeating outer columns right number times line .","code":"df1 <- tibble( g = c(1, 2, 3), data = list( tibble(x = 1, y = 2), tibble(x = 4:5, y = 6:7), tibble(x = 10) ) ) df1 #> # A tibble: 3 × 2 #> g data #> #> 1 1 #> 2 2 #> 3 3 df2 <- tribble( ~g, ~x, ~y, 1, 1, 2, 2, 4, 6, 2, 5, 7, 3, 10, NA ) df2 %>% nest(data = c(x, y)) #> # A tibble: 3 × 2 #> g data #> #> 1 1 #> 2 2 #> 3 3 df2 %>% group_by(g) %>% nest() #> # A tibble: 3 × 2 #> # Groups: g [3] #> g data #> #> 1 1 #> 2 2 #> 3 3 df1 %>% unnest(data) #> # A tibble: 4 × 3 #> g x y #> #> 1 1 1 2 #> 2 2 4 6 #> 3 2 5 7 #> 4 3 10 NA"},{"path":"https://tidyr.tidyverse.org/dev/articles/nest.html","id":"nested-data-and-models","dir":"Articles","previous_headings":"","what":"Nested data and models","title":"Nested data","text":"Nested data great fit problems one something group. common place arises ’re fitting multiple models. list data frames, ’s natural produce list models: even produce list predictions: workflow works particularly well conjunction broom, makes easy turn models tidy data frames can unnest()ed get back flat data frames. can see bigger example broom dplyr vignette.","code":"mtcars_nested <- mtcars %>% group_by(cyl) %>% nest() mtcars_nested #> # A tibble: 3 × 2 #> # Groups: cyl [3] #> cyl data #> #> 1 6 #> 2 4 #> 3 8 mtcars_nested <- mtcars_nested %>% mutate(model = map(data, function(df) lm(mpg ~ wt, data = df))) mtcars_nested #> # A tibble: 3 × 3 #> # Groups: cyl [3] #> cyl data model #> #> 1 6 #> 2 4 #> 3 8 mtcars_nested <- mtcars_nested %>% mutate(model = map(model, predict)) mtcars_nested #> # A tibble: 3 × 3 #> # Groups: cyl [3] #> cyl data model #> #> 1 6 #> 2 4 #> 3 8 "},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Pivoting","text":"vignette describes use new pivot_longer() pivot_wider() functions. goal improve usability gather() spread(), incorporate state---art features found packages. time, ’s obvious something fundamentally wrong design spread() gather(). Many people don’t find names intuitive find hard remember direction corresponds spreading gathering. also seems surprisingly hard remember arguments functions, meaning many people (including !) consult documentation every time. two important new features inspired R packages advancing reshaping R: pivot_longer() can work multiple value variables may different types, inspired enhanced melt() dcast() functions provided data.table package Matt Dowle Arun Srinivasan. pivot_longer() pivot_wider() can take data frame specifies precisely metadata stored column names becomes data variables (vice versa), inspired cdata package John Mount Nina Zumel. vignette, ’ll learn key ideas behind pivot_longer() pivot_wider() see used solve variety data reshaping challenges ranging simple complex. begin ’ll load needed packages. real analysis code, ’d imagine ’d library(tidyverse), can’t since vignette embedded package.","code":"library(tidyr) library(dplyr) library(readr)"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"longer","dir":"Articles","previous_headings":"","what":"Longer","title":"Pivoting","text":"pivot_longer() makes datasets longer increasing number rows decreasing number columns. don’t believe makes sense describe dataset “long form”. Length relative term, can say (e.g.) dataset longer dataset B. pivot_longer() commonly needed tidy wild-caught datasets often optimise ease data entry ease comparison rather ease analysis. following sections show use pivot_longer() wide range realistic datasets.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"pew","dir":"Articles","previous_headings":"Longer","what":"String data in column names","title":"Pivoting","text":"relig_income dataset stores counts based survey (among things) asked people religion annual income: dataset contains three variables: religion, stored rows, income spread across column names, count stored cell values. tidy use pivot_longer(): first argument dataset reshape, relig_income. cols describes columns need reshaped. case, ’s every column apart religion. names_to gives name variable created data stored column names, .e. income. values_to gives name variable created data stored cell value, .e. count. Neither names_to values_to column exists relig_income, provide strings surrounded quotes.","code":"relig_income #> # A tibble: 18 × 11 #> religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` #> #> 1 Agnostic 27 34 60 81 76 137 #> 2 Atheist 12 27 37 52 35 70 #> 3 Buddhist 27 21 30 34 33 58 #> 4 Catholic 418 617 732 670 638 1116 #> 5 Don’t know/r… 15 14 15 11 10 35 #> 6 Evangelical … 575 869 1064 982 881 1486 #> 7 Hindu 1 9 7 9 11 34 #> 8 Historically… 228 244 236 238 197 223 #> 9 Jehovah's Wi… 20 27 24 24 21 30 #> 10 Jewish 19 19 25 25 30 95 #> # ℹ 8 more rows #> # ℹ 4 more variables: `$75-100k` , `$100-150k` , `>150k` , #> # `Don't know/refused` relig_income %>% pivot_longer( cols = !religion, names_to = \"income\", values_to = \"count\" ) #> # A tibble: 180 × 3 #> religion income count #> #> 1 Agnostic <$10k 27 #> 2 Agnostic $10-20k 34 #> 3 Agnostic $20-30k 60 #> 4 Agnostic $30-40k 81 #> 5 Agnostic $40-50k 76 #> 6 Agnostic $50-75k 137 #> 7 Agnostic $75-100k 122 #> 8 Agnostic $100-150k 109 #> 9 Agnostic >150k 84 #> 10 Agnostic Don't know/refused 96 #> # ℹ 170 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"billboard","dir":"Articles","previous_headings":"Longer","what":"Numeric data in column names","title":"Pivoting","text":"billboard dataset records billboard rank songs year 2000. form similar relig_income data, data encoded column names really number, string. can start basic specification relig_income dataset. want names become variable called week, values become variable called rank. also use values_drop_na drop rows correspond missing values. every song stays charts 76 weeks, structure input data force creation unnecessary explicit NAs. nice easily determine long song stayed charts, , ’ll need convert week variable integer. can using two additional arguments: names_prefix strips wk prefix, names_transform converts week integer: Alternatively, single argument using readr::parse_number() automatically strips non-numeric components:","code":"billboard #> # A tibble: 317 × 79 #> artist track date.entered wk1 wk2 wk3 wk4 wk5 wk6 wk7 #> #> 1 2 Pac Baby… 2000-02-26 87 82 72 77 87 94 99 #> 2 2Ge+her The … 2000-09-02 91 87 92 NA NA NA NA #> 3 3 Doors D… Kryp… 2000-04-08 81 70 68 67 66 57 54 #> 4 3 Doors D… Loser 2000-10-21 76 76 72 69 67 65 55 #> 5 504 Boyz Wobb… 2000-04-15 57 34 25 17 17 31 36 #> 6 98^0 Give… 2000-08-19 51 39 34 26 26 19 2 #> 7 A*Teens Danc… 2000-07-08 97 97 96 95 100 NA NA #> 8 Aaliyah I Do… 2000-01-29 84 62 51 41 38 35 35 #> 9 Aaliyah Try … 2000-03-18 59 53 38 28 21 18 16 #> 10 Adams, Yo… Open… 2000-08-26 76 76 74 69 68 67 61 #> # ℹ 307 more rows #> # ℹ 69 more variables: wk8 , wk9 , wk10 , wk11 , #> # wk12 , wk13 , wk14 , wk15 , wk16 , #> # wk17 , wk18 , wk19 , wk20 , wk21 , #> # wk22 , wk23 , wk24 , wk25 , wk26 , #> # wk27 , wk28 , wk29 , wk30 , wk31 , #> # wk32 , wk33 , wk34 , wk35 , wk36 , … billboard %>% pivot_longer( cols = starts_with(\"wk\"), names_to = \"week\", values_to = \"rank\", values_drop_na = TRUE ) #> # A tibble: 5,307 × 5 #> artist track date.entered week rank #> #> 1 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk1 87 #> 2 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk2 82 #> 3 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk3 72 #> 4 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk4 77 #> 5 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk5 87 #> 6 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk6 94 #> 7 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk7 99 #> 8 2Ge+her The Hardest Part Of ... 2000-09-02 wk1 91 #> 9 2Ge+her The Hardest Part Of ... 2000-09-02 wk2 87 #> 10 2Ge+her The Hardest Part Of ... 2000-09-02 wk3 92 #> # ℹ 5,297 more rows billboard %>% pivot_longer( cols = starts_with(\"wk\"), names_to = \"week\", names_prefix = \"wk\", names_transform = as.integer, values_to = \"rank\", values_drop_na = TRUE, ) billboard %>% pivot_longer( cols = starts_with(\"wk\"), names_to = \"week\", names_transform = readr::parse_number, values_to = \"rank\", values_drop_na = TRUE, )"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"many-variables-in-column-names","dir":"Articles","previous_headings":"Longer","what":"Many variables in column names","title":"Pivoting","text":"challenging situation occurs multiple variables crammed column names. example, take dataset: country, iso2, iso3, year already variables, can left . columns new_sp_m014 newrel_f65 encode four variables names: new_/new prefix indicates counts new cases. dataset contains new cases, ’ll ignore ’s constant. sp/rel/ep describe case diagnosed. m/f gives gender. 014/1524/2535/3544/4554/65 supplies age range. can break variables specifying multiple column names names_to, either providing names_sep names_pattern. names_pattern natural fit. similar interface extract: give regular expression containing groups (defined ()) puts group column. go one step use readr functions convert gender age factors. think good practice categorical variables known set values. way little efficient mutate fact, pivot_longer() transform one occurence name mutate() need transform many repetitions.","code":"who #> # A tibble: 7,240 × 60 #> country iso2 iso3 year new_sp_m014 new_sp_m1524 new_sp_m2534 #> #> 1 Afghanistan AF AFG 1980 NA NA NA #> 2 Afghanistan AF AFG 1981 NA NA NA #> 3 Afghanistan AF AFG 1982 NA NA NA #> 4 Afghanistan AF AFG 1983 NA NA NA #> 5 Afghanistan AF AFG 1984 NA NA NA #> 6 Afghanistan AF AFG 1985 NA NA NA #> 7 Afghanistan AF AFG 1986 NA NA NA #> 8 Afghanistan AF AFG 1987 NA NA NA #> 9 Afghanistan AF AFG 1988 NA NA NA #> 10 Afghanistan AF AFG 1989 NA NA NA #> # ℹ 7,230 more rows #> # ℹ 53 more variables: new_sp_m3544 , new_sp_m4554 , #> # new_sp_m5564 , new_sp_m65 , new_sp_f014 , #> # new_sp_f1524 , new_sp_f2534 , new_sp_f3544 , #> # new_sp_f4554 , new_sp_f5564 , new_sp_f65 , #> # new_sn_m014 , new_sn_m1524 , new_sn_m2534 , #> # new_sn_m3544 , new_sn_m4554 , new_sn_m5564 , … who %>% pivot_longer( cols = new_sp_m014:newrel_f65, names_to = c(\"diagnosis\", \"gender\", \"age\"), names_pattern = \"new_?(.*)_(.)(.*)\", values_to = \"count\" ) #> # A tibble: 405,440 × 8 #> country iso2 iso3 year diagnosis gender age count #> #> 1 Afghanistan AF AFG 1980 sp m 014 NA #> 2 Afghanistan AF AFG 1980 sp m 1524 NA #> 3 Afghanistan AF AFG 1980 sp m 2534 NA #> 4 Afghanistan AF AFG 1980 sp m 3544 NA #> 5 Afghanistan AF AFG 1980 sp m 4554 NA #> 6 Afghanistan AF AFG 1980 sp m 5564 NA #> 7 Afghanistan AF AFG 1980 sp m 65 NA #> 8 Afghanistan AF AFG 1980 sp f 014 NA #> 9 Afghanistan AF AFG 1980 sp f 1524 NA #> 10 Afghanistan AF AFG 1980 sp f 2534 NA #> # ℹ 405,430 more rows who %>% pivot_longer( cols = new_sp_m014:newrel_f65, names_to = c(\"diagnosis\", \"gender\", \"age\"), names_pattern = \"new_?(.*)_(.)(.*)\", names_transform = list( gender = ~ readr::parse_factor(.x, levels = c(\"f\", \"m\")), age = ~ readr::parse_factor( .x, levels = c(\"014\", \"1524\", \"2534\", \"3544\", \"4554\", \"5564\", \"65\"), ordered = TRUE ) ), values_to = \"count\", )"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"multiple-observations-per-row","dir":"Articles","previous_headings":"Longer","what":"Multiple observations per row","title":"Pivoting","text":"far, working data frames one observation per row, many important pivoting problems involve multiple observations per row. can usually recognise case name column want appear output part column name input. section, ’ll learn pivot sort data. following example adapted data.table vignette, inspiration tidyr’s solution problem. Note two pieces information (values) child: name dob (date birth). need go separate columns result. supply multiple variables names_to, using names_sep split variable name. Note special name .value: tells pivot_longer() part column name specifies “value” measured (become variable output). Note use values_drop_na = TRUE: input shape forces creation explicit missing variables observations don’t exist. similar problem problem also exists anscombe dataset built base R: dataset contains four pairs variables (x1 y1, x2 y2, etc) underlie Anscombe’s quartet, collection four datasets summary statistics (mean, sd, correlation etc), quite different data. want produce dataset columns set, x y. Setting cols_vary \"slowest\" groups values columns x1 y1 together rows output moving x2 y2. argument often produces intuitively ordered output pivoting every column dataset. similar situation can arise panel data. example, take example dataset provided Thomas Leeper. can tidy using approach anscombe:","code":"household #> # A tibble: 5 × 5 #> family dob_child1 dob_child2 name_child1 name_child2 #> #> 1 1 1998-11-26 2000-01-29 Susan Jose #> 2 2 1996-06-22 NA Mark NA #> 3 3 2002-07-11 2004-04-05 Sam Seth #> 4 4 2004-10-10 2009-08-27 Craig Khai #> 5 5 2000-12-05 2005-02-28 Parker Gracie household %>% pivot_longer( cols = !family, names_to = c(\".value\", \"child\"), names_sep = \"_\", values_drop_na = TRUE ) #> # A tibble: 9 × 4 #> family child dob name #> #> 1 1 child1 1998-11-26 Susan #> 2 1 child2 2000-01-29 Jose #> 3 2 child1 1996-06-22 Mark #> 4 3 child1 2002-07-11 Sam #> 5 3 child2 2004-04-05 Seth #> 6 4 child1 2004-10-10 Craig #> 7 4 child2 2009-08-27 Khai #> 8 5 child1 2000-12-05 Parker #> 9 5 child2 2005-02-28 Gracie anscombe #> x1 x2 x3 x4 y1 y2 y3 y4 #> 1 10 10 10 8 8.04 9.14 7.46 6.58 #> 2 8 8 8 8 6.95 8.14 6.77 5.76 #> 3 13 13 13 8 7.58 8.74 12.74 7.71 #> 4 9 9 9 8 8.81 8.77 7.11 8.84 #> 5 11 11 11 8 8.33 9.26 7.81 8.47 #> 6 14 14 14 8 9.96 8.10 8.84 7.04 #> 7 6 6 6 8 7.24 6.13 6.08 5.25 #> 8 4 4 4 19 4.26 3.10 5.39 12.50 #> 9 12 12 12 8 10.84 9.13 8.15 5.56 #> 10 7 7 7 8 4.82 7.26 6.42 7.91 #> 11 5 5 5 8 5.68 4.74 5.73 6.89 anscombe %>% pivot_longer( cols = everything(), cols_vary = \"slowest\", names_to = c(\".value\", \"set\"), names_pattern = \"(.)(.)\" ) #> # A tibble: 44 × 3 #> set x y #> #> 1 1 10 8.04 #> 2 1 8 6.95 #> 3 1 13 7.58 #> 4 1 9 8.81 #> 5 1 11 8.33 #> 6 1 14 9.96 #> 7 1 6 7.24 #> 8 1 4 4.26 #> 9 1 12 10.8 #> 10 1 7 4.82 #> # ℹ 34 more rows pnl <- tibble( x = 1:4, a = c(1, 1,0, 0), b = c(0, 1, 1, 1), y1 = rnorm(4), y2 = rnorm(4), z1 = rep(3, 4), z2 = rep(-2, 4), ) pnl %>% pivot_longer( cols = !c(x, a, b), names_to = c(\".value\", \"time\"), names_pattern = \"(.)(.)\" ) #> # A tibble: 8 × 6 #> x a b time y z #> #> 1 1 1 0 1 -1.40 3 #> 2 1 1 0 2 0.622 -2 #> 3 2 1 1 1 0.255 3 #> 4 2 1 1 2 1.15 -2 #> 5 3 0 1 1 -2.44 3 #> 6 3 0 1 2 -1.82 -2 #> 7 4 0 1 1 -0.00557 3 #> 8 4 0 1 2 -0.247 -2"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"wider","dir":"Articles","previous_headings":"","what":"Wider","title":"Pivoting","text":"pivot_wider() opposite pivot_longer(): makes dataset wider increasing number columns decreasing number rows. ’s relatively rare need pivot_wider() make tidy data, ’s often useful creating summary tables presentation, data format needed tools.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"capture-recapture-data","dir":"Articles","previous_headings":"Wider","what":"Capture-recapture data","title":"Pivoting","text":"fish_encounters dataset, contributed Myfanwy Johnston, describes fish swimming river detected automatic monitoring stations: Many tools used analyse data need form station column: dataset records fish detected station - doesn’t record wasn’t detected (common type data). means output data filled NAs. However, case know absence record means fish seen, can ask pivot_wider() fill missing values zeros:","code":"fish_encounters #> # A tibble: 114 × 3 #> fish station seen #> #> 1 4842 Release 1 #> 2 4842 I80_1 1 #> 3 4842 Lisbon 1 #> 4 4842 Rstr 1 #> 5 4842 Base_TD 1 #> 6 4842 BCE 1 #> 7 4842 BCW 1 #> 8 4842 BCE2 1 #> 9 4842 BCW2 1 #> 10 4842 MAE 1 #> # ℹ 104 more rows fish_encounters %>% pivot_wider( names_from = station, values_from = seen ) #> # A tibble: 19 × 12 #> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE #> #> 1 4842 1 1 1 1 1 1 1 1 1 1 #> 2 4843 1 1 1 1 1 1 1 1 1 1 #> 3 4844 1 1 1 1 1 1 1 1 1 1 #> 4 4845 1 1 1 1 1 NA NA NA NA NA #> 5 4847 1 1 1 NA NA NA NA NA NA NA #> 6 4848 1 1 1 1 NA NA NA NA NA NA #> 7 4849 1 1 NA NA NA NA NA NA NA NA #> 8 4850 1 1 NA 1 1 1 1 NA NA NA #> 9 4851 1 1 NA NA NA NA NA NA NA NA #> 10 4854 1 1 NA NA NA NA NA NA NA NA #> # ℹ 9 more rows #> # ℹ 1 more variable: MAW fish_encounters %>% pivot_wider( names_from = station, values_from = seen, values_fill = 0 ) #> # A tibble: 19 × 12 #> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE #> #> 1 4842 1 1 1 1 1 1 1 1 1 1 #> 2 4843 1 1 1 1 1 1 1 1 1 1 #> 3 4844 1 1 1 1 1 1 1 1 1 1 #> 4 4845 1 1 1 1 1 0 0 0 0 0 #> 5 4847 1 1 1 0 0 0 0 0 0 0 #> 6 4848 1 1 1 1 0 0 0 0 0 0 #> 7 4849 1 1 0 0 0 0 0 0 0 0 #> 8 4850 1 1 0 1 1 1 1 0 0 0 #> 9 4851 1 1 0 0 0 0 0 0 0 0 #> 10 4854 1 1 0 0 0 0 0 0 0 0 #> # ℹ 9 more rows #> # ℹ 1 more variable: MAW "},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"aggregation","dir":"Articles","previous_headings":"Wider","what":"Aggregation","title":"Pivoting","text":"can also use pivot_wider() perform simple aggregation. example, take warpbreaks dataset built base R (converted tibble better print method): designed experiment nine replicates every combination wool (B) tension (L, M, H): happens attempt pivot levels wool columns? get warning cell output corresponds multiple cells input. default behaviour produces list-columns, contain individual values. useful output summary statistics, e.g. mean breaks combination wool tension: complex summary operations, recommend summarising reshaping, simple cases ’s often convenient summarise within pivot_wider().","code":"warpbreaks <- warpbreaks %>% as_tibble() %>% select(wool, tension, breaks) warpbreaks #> # A tibble: 54 × 3 #> wool tension breaks #> #> 1 A L 26 #> 2 A L 30 #> 3 A L 54 #> 4 A L 25 #> 5 A L 70 #> 6 A L 52 #> 7 A L 51 #> 8 A L 26 #> 9 A L 67 #> 10 A M 18 #> # ℹ 44 more rows warpbreaks %>% count(wool, tension) #> # A tibble: 6 × 3 #> wool tension n #> #> 1 A L 9 #> 2 A M 9 #> 3 A H 9 #> 4 B L 9 #> 5 B M 9 #> 6 B H 9 warpbreaks %>% pivot_wider( names_from = wool, values_from = breaks ) #> Warning: Values from `breaks` are not uniquely identified; output will contain #> list-cols. #> • Use `values_fn = list` to suppress this warning. #> • Use `values_fn = {summary_fun}` to summarise duplicates. #> • Use the following dplyr code to identify duplicates. #> {data} |> #> dplyr::summarise(n = dplyr::n(), .by = c(tension, wool)) |> #> dplyr::filter(n > 1L) #> # A tibble: 3 × 3 #> tension A B #> #> 1 L #> 2 M #> 3 H warpbreaks %>% pivot_wider( names_from = wool, values_from = breaks, values_fn = mean ) #> # A tibble: 3 × 3 #> tension A B #> #> 1 L 44.6 28.2 #> 2 M 24 28.8 #> 3 H 24.6 18.8"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"generate-column-name-from-multiple-variables","dir":"Articles","previous_headings":"Wider","what":"Generate column name from multiple variables","title":"Pivoting","text":"Imagine, https://stackoverflow.com/questions/24929954, information containing combination product, country, year. tidy form might look like : want widen data one column combination product country. key specify multiple variables names_from: either names_from values_from select multiple variables, can control column names output constructed names_sep names_prefix, workhorse names_glue:","code":"production <- expand_grid( product = c(\"A\", \"B\"), country = c(\"AI\", \"EI\"), year = 2000:2014 ) %>% filter((product == \"A\" & country == \"AI\") | product == \"B\") %>% mutate(production = rnorm(nrow(.))) production #> # A tibble: 45 × 4 #> product country year production #> #> 1 A AI 2000 -0.244 #> 2 A AI 2001 -0.283 #> 3 A AI 2002 -0.554 #> 4 A AI 2003 0.629 #> 5 A AI 2004 2.07 #> 6 A AI 2005 -1.63 #> 7 A AI 2006 0.512 #> 8 A AI 2007 -1.86 #> 9 A AI 2008 -0.522 #> 10 A AI 2009 -0.0526 #> # ℹ 35 more rows production %>% pivot_wider( names_from = c(product, country), values_from = production ) #> # A tibble: 15 × 4 #> year A_AI B_AI B_EI #> #> 1 2000 -0.244 0.738 -0.313 #> 2 2001 -0.283 1.89 1.07 #> 3 2002 -0.554 -0.0974 0.0700 #> 4 2003 0.629 -0.936 -0.639 #> 5 2004 2.07 -0.0160 -0.0500 #> 6 2005 -1.63 -0.827 -0.251 #> 7 2006 0.512 -1.51 0.445 #> 8 2007 -1.86 0.935 2.76 #> 9 2008 -0.522 0.176 0.0465 #> 10 2009 -0.0526 0.244 0.578 #> # ℹ 5 more rows production %>% pivot_wider( names_from = c(product, country), values_from = production, names_sep = \".\", names_prefix = \"prod.\" ) #> # A tibble: 15 × 4 #> year prod.A.AI prod.B.AI prod.B.EI #> #> 1 2000 -0.244 0.738 -0.313 #> 2 2001 -0.283 1.89 1.07 #> 3 2002 -0.554 -0.0974 0.0700 #> 4 2003 0.629 -0.936 -0.639 #> 5 2004 2.07 -0.0160 -0.0500 #> 6 2005 -1.63 -0.827 -0.251 #> 7 2006 0.512 -1.51 0.445 #> 8 2007 -1.86 0.935 2.76 #> 9 2008 -0.522 0.176 0.0465 #> 10 2009 -0.0526 0.244 0.578 #> # ℹ 5 more rows production %>% pivot_wider( names_from = c(product, country), values_from = production, names_glue = \"prod_{product}_{country}\" ) #> # A tibble: 15 × 4 #> year prod_A_AI prod_B_AI prod_B_EI #> #> 1 2000 -0.244 0.738 -0.313 #> 2 2001 -0.283 1.89 1.07 #> 3 2002 -0.554 -0.0974 0.0700 #> 4 2003 0.629 -0.936 -0.639 #> 5 2004 2.07 -0.0160 -0.0500 #> 6 2005 -1.63 -0.827 -0.251 #> 7 2006 0.512 -1.51 0.445 #> 8 2007 -1.86 0.935 2.76 #> 9 2008 -0.522 0.176 0.0465 #> 10 2009 -0.0526 0.244 0.578 #> # ℹ 5 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"tidy-census","dir":"Articles","previous_headings":"Wider","what":"Tidy census","title":"Pivoting","text":"us_rent_income dataset contains information median income rent state US 2017 (American Community Survey, retrieved tidycensus package). estimate moe values columns, can supply values_from: Note name variable automatically appended output columns.","code":"us_rent_income #> # A tibble: 104 × 5 #> GEOID NAME variable estimate moe #> #> 1 01 Alabama income 24476 136 #> 2 01 Alabama rent 747 3 #> 3 02 Alaska income 32940 508 #> 4 02 Alaska rent 1200 13 #> 5 04 Arizona income 27517 148 #> 6 04 Arizona rent 972 4 #> 7 05 Arkansas income 23789 165 #> 8 05 Arkansas rent 709 5 #> 9 06 California income 29454 109 #> 10 06 California rent 1358 3 #> # ℹ 94 more rows us_rent_income %>% pivot_wider( names_from = variable, values_from = c(estimate, moe) ) #> # A tibble: 52 × 6 #> GEOID NAME estimate_income estimate_rent moe_income moe_rent #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Co… 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"implicit-missing-values","dir":"Articles","previous_headings":"Wider","what":"Implicit missing values","title":"Pivoting","text":"Occasionally, ’ll come across data names variable encoded factor, data represented. pivot_wider() defaults generating columns values actually represented data, might want include column possible level case data changes future. names_expand argument turn implicit factor levels explicit ones, forcing represented result. also sorts column names using level order, produces intuitive results case. multiple names_from columns provided, names_expand generate Cartesian product possible combinations names_from values. Notice following data omitted rows percentage value 0. names_expand allows us make explicit pivot. related problem can occur implicit missing factor levels combinations id_cols. case, missing rows (rather columns) ’d like explicitly represent. example, ’ll modify daily data type column, pivot instead, keeping day id column. type levels represented columns, missing rows related unrepresented day factor levels. can use id_expand way used names_expand, expand (sort) implicit missing rows id_cols.","code":"weekdays <- c(\"Mon\", \"Tue\", \"Wed\", \"Thu\", \"Fri\", \"Sat\", \"Sun\") daily <- tibble( day = factor(c(\"Tue\", \"Thu\", \"Fri\", \"Mon\"), levels = weekdays), value = c(2, 3, 1, 5) ) daily #> # A tibble: 4 × 2 #> day value #> #> 1 Tue 2 #> 2 Thu 3 #> 3 Fri 1 #> 4 Mon 5 daily %>% pivot_wider( names_from = day, values_from = value ) #> # A tibble: 1 × 4 #> Tue Thu Fri Mon #> #> 1 2 3 1 5 daily %>% pivot_wider( names_from = day, values_from = value, names_expand = TRUE ) #> # A tibble: 1 × 7 #> Mon Tue Wed Thu Fri Sat Sun #> #> 1 5 2 NA 3 1 NA NA percentages <- tibble( year = c(2018, 2019, 2020, 2020), type = factor(c(\"A\", \"B\", \"A\", \"B\"), levels = c(\"A\", \"B\")), percentage = c(100, 100, 40, 60) ) percentages #> # A tibble: 4 × 3 #> year type percentage #> #> 1 2018 A 100 #> 2 2019 B 100 #> 3 2020 A 40 #> 4 2020 B 60 percentages %>% pivot_wider( names_from = c(year, type), values_from = percentage, names_expand = TRUE, values_fill = 0 ) #> # A tibble: 1 × 6 #> `2018_A` `2018_B` `2019_A` `2019_B` `2020_A` `2020_B` #> #> 1 100 0 0 100 40 60 daily <- mutate(daily, type = factor(c(\"A\", \"B\", \"B\", \"A\"))) daily #> # A tibble: 4 × 3 #> day value type #> #> 1 Tue 2 A #> 2 Thu 3 B #> 3 Fri 1 B #> 4 Mon 5 A daily %>% pivot_wider( names_from = type, values_from = value, values_fill = 0 ) #> # A tibble: 4 × 3 #> day A B #> #> 1 Tue 2 0 #> 2 Thu 0 3 #> 3 Fri 0 1 #> 4 Mon 5 0 daily %>% pivot_wider( names_from = type, values_from = value, values_fill = 0, id_expand = TRUE ) #> # A tibble: 7 × 3 #> day A B #> #> 1 Mon 5 0 #> 2 Tue 2 0 #> 3 Wed 0 0 #> 4 Thu 0 3 #> 5 Fri 0 1 #> 6 Sat 0 0 #> 7 Sun 0 0"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"unused-columns","dir":"Articles","previous_headings":"Wider","what":"Unused columns","title":"Pivoting","text":"Imagine ’ve found situation columns data completely unrelated pivoting process, ’d still like retain information somehow. example, updates ’d like pivot system column create one row summaries county’s system updates. typical pivot_wider() call, completely lose information date column. example, ’d like retain recent update date across systems particular county. accomplish can use unused_fn argument, allows us summarize values columns utilized pivoting process. can also retain data delay aggregation entirely using list() summary function.","code":"updates <- tibble( county = c(\"Wake\", \"Wake\", \"Wake\", \"Guilford\", \"Guilford\"), date = c(as.Date(\"2020-01-01\") + 0:2, as.Date(\"2020-01-03\") + 0:1), system = c(\"A\", \"B\", \"C\", \"A\", \"C\"), value = c(3.2, 4, 5.5, 2, 1.2) ) updates #> # A tibble: 5 × 4 #> county date system value #> #> 1 Wake 2020-01-01 A 3.2 #> 2 Wake 2020-01-02 B 4 #> 3 Wake 2020-01-03 C 5.5 #> 4 Guilford 2020-01-03 A 2 #> 5 Guilford 2020-01-04 C 1.2 updates %>% pivot_wider( id_cols = county, names_from = system, values_from = value ) #> # A tibble: 2 × 4 #> county A B C #> #> 1 Wake 3.2 4 5.5 #> 2 Guilford 2 NA 1.2 updates %>% pivot_wider( id_cols = county, names_from = system, values_from = value, unused_fn = list(date = max) ) #> # A tibble: 2 × 5 #> county A B C date #> #> 1 Wake 3.2 4 5.5 2020-01-03 #> 2 Guilford 2 NA 1.2 2020-01-04 updates %>% pivot_wider( id_cols = county, names_from = system, values_from = value, unused_fn = list(date = list) ) #> # A tibble: 2 × 5 #> county A B C date #> #> 1 Wake 3.2 4 5.5 #> 2 Guilford 2 NA 1.2 "},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"contact-list","dir":"Articles","previous_headings":"Wider","what":"Contact list","title":"Pivoting","text":"final challenge inspired Jiena Gu. Imagine contact list ’ve copied pasted website: challenging ’s variable identifies observations belong together. can fix noting every contact starts name, can create unique id counting every time see “name” field: Now unique identifier person, can pivot field value columns:","code":"contacts <- tribble( ~field, ~value, \"name\", \"Jiena McLellan\", \"company\", \"Toyota\", \"name\", \"John Smith\", \"company\", \"google\", \"email\", \"john@google.com\", \"name\", \"Huxley Ratcliffe\" ) contacts <- contacts %>% mutate( person_id = cumsum(field == \"name\") ) contacts #> # A tibble: 6 × 3 #> field value person_id #> #> 1 name Jiena McLellan 1 #> 2 company Toyota 1 #> 3 name John Smith 2 #> 4 company google 2 #> 5 email john@google.com 2 #> 6 name Huxley Ratcliffe 3 contacts %>% pivot_wider( names_from = field, values_from = value ) #> # A tibble: 3 × 4 #> person_id name company email #> #> 1 1 Jiena McLellan Toyota NA #> 2 2 John Smith google john@google.com #> 3 3 Huxley Ratcliffe NA NA"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"longer-then-wider","dir":"Articles","previous_headings":"","what":"Longer, then wider","title":"Pivoting","text":"problems can’t solved pivoting single direction. examples section show might combine pivot_longer() pivot_wider() solve complex problems.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"world-bank","dir":"Articles","previous_headings":"Longer, then wider","what":"World bank","title":"Pivoting","text":"world_bank_pop contains data World Bank population per country 2000 2018. goal produce tidy dataset variable column. ’s obvious exactly steps needed yet, ’ll start obvious problem: year spread across multiple columns. Next need consider indicator variable: SP.POP.GROW population growth, SP.POP.TOTL total population, SP.URB.* urban areas. Let’s split two variables: area (total urban) actual variable (population growth): Now can complete tidying pivoting variable value make TOTL GROW columns:","code":"world_bank_pop #> # A tibble: 1,064 × 20 #> country indicator `2000` `2001` `2002` `2003` `2004` `2005` #> #> 1 ABW SP.URB.TOTL 4.16e4 4.20e+4 4.22e+4 4.23e+4 4.23e+4 4.24e+4 #> 2 ABW SP.URB.GROW 1.66e0 9.56e-1 4.01e-1 1.97e-1 9.46e-2 1.94e-1 #> 3 ABW SP.POP.TOTL 8.91e4 9.07e+4 9.18e+4 9.27e+4 9.35e+4 9.45e+4 #> 4 ABW SP.POP.GROW 2.54e0 1.77e+0 1.19e+0 9.97e-1 9.01e-1 1.00e+0 #> 5 AFE SP.URB.TOTL 1.16e8 1.20e+8 1.24e+8 1.29e+8 1.34e+8 1.39e+8 #> 6 AFE SP.URB.GROW 3.60e0 3.66e+0 3.72e+0 3.71e+0 3.74e+0 3.81e+0 #> 7 AFE SP.POP.TOTL 4.02e8 4.12e+8 4.23e+8 4.34e+8 4.45e+8 4.57e+8 #> 8 AFE SP.POP.GROW 2.58e0 2.59e+0 2.61e+0 2.62e+0 2.64e+0 2.67e+0 #> 9 AFG SP.URB.TOTL 4.31e6 4.36e+6 4.67e+6 5.06e+6 5.30e+6 5.54e+6 #> 10 AFG SP.URB.GROW 1.86e0 1.15e+0 6.86e+0 7.95e+0 4.59e+0 4.47e+0 #> # ℹ 1,054 more rows #> # ℹ 12 more variables: `2006` , `2007` , `2008` , #> # `2009` , `2010` , `2011` , `2012` , `2013` , #> # `2014` , `2015` , `2016` , `2017` pop2 <- world_bank_pop %>% pivot_longer( cols = `2000`:`2017`, names_to = \"year\", values_to = \"value\" ) pop2 #> # A tibble: 19,152 × 4 #> country indicator year value #> #> 1 ABW SP.URB.TOTL 2000 41625 #> 2 ABW SP.URB.TOTL 2001 42025 #> 3 ABW SP.URB.TOTL 2002 42194 #> 4 ABW SP.URB.TOTL 2003 42277 #> 5 ABW SP.URB.TOTL 2004 42317 #> 6 ABW SP.URB.TOTL 2005 42399 #> 7 ABW SP.URB.TOTL 2006 42555 #> 8 ABW SP.URB.TOTL 2007 42729 #> 9 ABW SP.URB.TOTL 2008 42906 #> 10 ABW SP.URB.TOTL 2009 43079 #> # ℹ 19,142 more rows pop2 %>% count(indicator) #> # A tibble: 4 × 2 #> indicator n #> #> 1 SP.POP.GROW 4788 #> 2 SP.POP.TOTL 4788 #> 3 SP.URB.GROW 4788 #> 4 SP.URB.TOTL 4788 pop3 <- pop2 %>% separate(indicator, c(NA, \"area\", \"variable\")) pop3 #> # A tibble: 19,152 × 5 #> country area variable year value #> #> 1 ABW URB TOTL 2000 41625 #> 2 ABW URB TOTL 2001 42025 #> 3 ABW URB TOTL 2002 42194 #> 4 ABW URB TOTL 2003 42277 #> 5 ABW URB TOTL 2004 42317 #> 6 ABW URB TOTL 2005 42399 #> 7 ABW URB TOTL 2006 42555 #> 8 ABW URB TOTL 2007 42729 #> 9 ABW URB TOTL 2008 42906 #> 10 ABW URB TOTL 2009 43079 #> # ℹ 19,142 more rows pop3 %>% pivot_wider( names_from = variable, values_from = value ) #> # A tibble: 9,576 × 5 #> country area year TOTL GROW #> #> 1 ABW URB 2000 41625 1.66 #> 2 ABW URB 2001 42025 0.956 #> 3 ABW URB 2002 42194 0.401 #> 4 ABW URB 2003 42277 0.197 #> 5 ABW URB 2004 42317 0.0946 #> 6 ABW URB 2005 42399 0.194 #> 7 ABW URB 2006 42555 0.367 #> 8 ABW URB 2007 42729 0.408 #> 9 ABW URB 2008 42906 0.413 #> 10 ABW URB 2009 43079 0.402 #> # ℹ 9,566 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"multi-choice","dir":"Articles","previous_headings":"Longer, then wider","what":"Multi-choice","title":"Pivoting","text":"Based suggestion Maxime Wack, https://github.com/tidyverse/tidyr/issues/384), final example shows deal common way recording multiple choice data. Often get data follows: actual order isn’t important, ’d prefer individual questions columns. can achieve desired transformation two steps. First, make data longer, eliminating explicit NAs, adding column indicate choice chosen: make data wider, filling missing observations FALSE:","code":"multi <- tribble( ~id, ~choice1, ~choice2, ~choice3, 1, \"A\", \"B\", \"C\", 2, \"C\", \"B\", NA, 3, \"D\", NA, NA, 4, \"B\", \"D\", NA ) multi2 <- multi %>% pivot_longer( cols = !id, values_drop_na = TRUE ) %>% mutate(checked = TRUE) multi2 #> # A tibble: 8 × 4 #> id name value checked #> #> 1 1 choice1 A TRUE #> 2 1 choice2 B TRUE #> 3 1 choice3 C TRUE #> 4 2 choice1 C TRUE #> 5 2 choice2 B TRUE #> 6 3 choice1 D TRUE #> 7 4 choice1 B TRUE #> 8 4 choice2 D TRUE multi2 %>% pivot_wider( id_cols = id, names_from = value, values_from = checked, values_fill = FALSE ) #> # A tibble: 4 × 5 #> id A B C D #> #> 1 1 TRUE TRUE TRUE FALSE #> 2 2 FALSE TRUE TRUE FALSE #> 3 3 FALSE FALSE FALSE TRUE #> 4 4 FALSE TRUE FALSE TRUE"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"manual-specs","dir":"Articles","previous_headings":"","what":"Manual specs","title":"Pivoting","text":"arguments pivot_longer() pivot_wider() allow pivot wide range datasets. creativity people apply data structures seemingly endless, ’s quite possible encounter dataset can’t immediately see reshape pivot_longer() pivot_wider(). gain control pivoting, can instead create “spec” data frame describes exactly data stored column names becomes variables (vice versa). section introduces spec data structure, show use pivot_longer() pivot_wider() insufficient.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"longer-1","dir":"Articles","previous_headings":"Manual specs","what":"Longer","title":"Pivoting","text":"see works, lets return simplest case pivoting applied relig_income dataset. Now pivoting happens two steps: first create spec object (using build_longer_spec()) use describe pivoting operation: (gives result , just code. ’s need use , presented simple example using spec.) spec look like? ’s data frame one row column wide format version data present long format, two special columns start .: .name gives name column. .value gives name column values cells go . also one column spec column present long format data present wide format data. corresponds names_to argument pivot_longer() build_longer_spec() names_from argument pivot_wider() build_wider_spec(). example, income column character vector names columns pivoted.","code":"spec <- relig_income %>% build_longer_spec( cols = !religion, names_to = \"income\", values_to = \"count\" ) pivot_longer_spec(relig_income, spec) #> # A tibble: 180 × 3 #> religion income count #> #> 1 Agnostic <$10k 27 #> 2 Agnostic $10-20k 34 #> 3 Agnostic $20-30k 60 #> 4 Agnostic $30-40k 81 #> 5 Agnostic $40-50k 76 #> 6 Agnostic $50-75k 137 #> 7 Agnostic $75-100k 122 #> 8 Agnostic $100-150k 109 #> 9 Agnostic >150k 84 #> 10 Agnostic Don't know/refused 96 #> # ℹ 170 more rows spec #> # A tibble: 10 × 3 #> .name .value income #> #> 1 <$10k count <$10k #> 2 $10-20k count $10-20k #> 3 $20-30k count $20-30k #> 4 $30-40k count $30-40k #> 5 $40-50k count $40-50k #> 6 $50-75k count $50-75k #> 7 $75-100k count $75-100k #> 8 $100-150k count $100-150k #> 9 >150k count >150k #> 10 Don't know/refused count Don't know/refused"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"wider-1","dir":"Articles","previous_headings":"Manual specs","what":"Wider","title":"Pivoting","text":"widen us_rent_income pivot_wider(). result ok, think improved: think better columns income, rent, income_moe, rent_moe, can achieve manual spec. current spec looks like : case, mutate spec carefully construct column names: Supplying spec pivot_wider() gives us result ’re looking :","code":"us_rent_income %>% pivot_wider( names_from = variable, values_from = c(estimate, moe) ) #> # A tibble: 52 × 6 #> GEOID NAME estimate_income estimate_rent moe_income moe_rent #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Co… 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows spec1 <- us_rent_income %>% build_wider_spec( names_from = variable, values_from = c(estimate, moe) ) spec1 #> # A tibble: 4 × 3 #> .name .value variable #> #> 1 estimate_income estimate income #> 2 estimate_rent estimate rent #> 3 moe_income moe income #> 4 moe_rent moe rent spec2 <- spec1 %>% mutate( .name = paste0(variable, ifelse(.value == \"moe\", \"_moe\", \"\")) ) spec2 #> # A tibble: 4 × 3 #> .name .value variable #> #> 1 income estimate income #> 2 rent estimate rent #> 3 income_moe moe income #> 4 rent_moe moe rent us_rent_income %>% pivot_wider_spec(spec2) #> # A tibble: 52 × 6 #> GEOID NAME income rent income_moe rent_moe #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Columbia 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"by-hand","dir":"Articles","previous_headings":"Manual specs","what":"By hand","title":"Pivoting","text":"Sometimes ’s possible (convenient) compute spec, instead ’s convenient construct spec “hand”. example, take construction data, lightly modified Table 5 “completions” found https://www.census.gov/construction/nrc/index.html: sort data uncommon government agencies: column names actually belong different variables, summaries number units (1, 2-4, 5+) regions country (NE, NW, midwest, S, W). can easily describe tibble: yields following longer form: Note overlap units region variables; data really naturally described two independent tables.","code":"construction #> # A tibble: 9 × 9 #> Year Month `1 unit` `2 to 4 units` `5 units or more` Northeast Midwest #> #> 1 2018 Janua… 859 NA 348 114 169 #> 2 2018 Febru… 882 NA 400 138 160 #> 3 2018 March 862 NA 356 150 154 #> 4 2018 April 797 NA 447 144 196 #> 5 2018 May 875 NA 364 90 169 #> 6 2018 June 867 NA 342 76 170 #> 7 2018 July 829 NA 360 108 183 #> 8 2018 August 939 NA 286 90 205 #> 9 2018 Septe… 835 NA 304 117 175 #> # ℹ 2 more variables: South , West spec <- tribble( ~.name, ~.value, ~units, ~region, \"1 unit\", \"n\", \"1\", NA, \"2 to 4 units\", \"n\", \"2-4\", NA, \"5 units or more\", \"n\", \"5+\", NA, \"Northeast\", \"n\", NA, \"Northeast\", \"Midwest\", \"n\", NA, \"Midwest\", \"South\", \"n\", NA, \"South\", \"West\", \"n\", NA, \"West\", ) construction %>% pivot_longer_spec(spec) #> # A tibble: 63 × 5 #> Year Month units region n #> #> 1 2018 January 1 NA 859 #> 2 2018 January 2-4 NA NA #> 3 2018 January 5+ NA 348 #> 4 2018 January NA Northeast 114 #> 5 2018 January NA Midwest 169 #> 6 2018 January NA South 596 #> 7 2018 January NA West 339 #> 8 2018 February 1 NA 882 #> 9 2018 February 2-4 NA NA #> 10 2018 February 5+ NA 400 #> # ℹ 53 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"theory","dir":"Articles","previous_headings":"Manual specs","what":"Theory","title":"Pivoting","text":"One neat property spec need spec pivot_longer() pivot_wider(). makes clear two operations symmetric: pivoting spec allows us precise exactly pivot_longer(df, spec = spec) changes shape df: nrow(df) * nrow(spec) rows, ncol(df) - nrow(spec) + ncol(spec) - 2 columns.","code":"construction %>% pivot_longer_spec(spec) %>% pivot_wider_spec(spec) #> # A tibble: 9 × 9 #> Year Month `1 unit` `2 to 4 units` `5 units or more` Northeast Midwest #> #> 1 2018 Janua… 859 NA 348 114 169 #> 2 2018 Febru… 882 NA 400 138 160 #> 3 2018 March 862 NA 356 150 154 #> 4 2018 April 797 NA 447 144 196 #> 5 2018 May 875 NA 364 90 169 #> 6 2018 June 867 NA 342 76 170 #> 7 2018 July 829 NA 360 108 183 #> 8 2018 August 939 NA 286 90 205 #> 9 2018 Septe… 835 NA 304 117 175 #> # ℹ 2 more variables: South , West "},{"path":"https://tidyr.tidyverse.org/dev/articles/programming.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Programming with tidyr","text":"tidyr verbs use tidy evaluation make interactive data exploration fast fluid. Tidy evaluation special type non-standard evaluation used throughout tidyverse. ’s typical tidyr code: Tidy evaluation can use !Species say “columns except Species”, without quote column name (\"Species\") refer enclosing data frame (iris$Species). Two basic forms tidy evaluation used tidyr: Tidy selection: drop_na(), fill(), pivot_longer()/pivot_wider(), nest()/unnest(), separate()/extract(), unite() let select variables based position, name, type (e.g. 1:3, starts_with(\"x\"), .numeric). Literally, can use techniques dplyr::select(). Data masking: expand(), crossing() nesting() let refer use data variables variables environment (.e. write my_variable df$my_variable). focus tidy selection , since ’s common. can learn data masking equivalent vignette dplyr: https://dplyr.tidyverse.org/dev/articles/programming.html. considerations writing tidyr code packages, please see vignette(\"-packages\"). ’ve pointed tidyr’s tidy evaluation interface optimized interactive exploration. flip side adds challenges indirect use, .e. ’re working inside loop function. vignette shows overcome challenges. ’ll first go basics tidy selection data masking, talk use indirectly, show number recipes solve common problems. go , reveal version tidyr ’re using make small dataset use examples.","code":"library(tidyr) iris %>% nest(data = !Species) #> # A tibble: 3 × 2 #> Species data #> #> 1 setosa #> 2 versicolor #> 3 virginica packageVersion(\"tidyr\") #> [1] '1.3.1.9000' mini_iris <- as_tibble(iris)[c(1, 2, 51, 52, 101, 102), ] mini_iris #> # A tibble: 6 × 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 7 3.2 4.7 1.4 versicolor #> 4 6.4 3.2 4.5 1.5 versicolor #> 5 6.3 3.3 6 2.5 virginica #> 6 5.8 2.7 5.1 1.9 virginica"},{"path":"https://tidyr.tidyverse.org/dev/articles/programming.html","id":"tidy-selection","dir":"Articles","previous_headings":"","what":"Tidy selection","title":"Programming with tidyr","text":"Underneath functions use tidy selection tidyselect package. provides miniature domain specific language makes easy select columns name, position, type. example: select(df, 1) selects first column; select(df, last_col()) selects last column. select(df, c(, b, c)) selects columns , b, c. select(df, starts_with(\"\")) selects columns whose name starts “”; select(df, ends_with(\"z\")) selects columns whose name ends “z”. select(df, (.numeric)) selects numeric columns. can see details ?tidyr_tidy_select.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/programming.html","id":"indirection","dir":"Articles","previous_headings":"Tidy selection","what":"Indirection","title":"Programming with tidyr","text":"Tidy selection makes common task easier cost making less common task harder. want use tidy select indirectly column specification stored intermediate variable, ’ll need learn new tools. three main cases comes : tidy-select specification function argument, must embrace argument surrounding doubled braces. character vector variable names, must use all_of() any_of() depending whether want function error variable found. functions allow write loops function takes variable names character vector. complicated cases, might want use tidyselect directly: Learn vignette(\"tidyselect\"). Note many tidyr functions use ... can easily select many variables, e.g. fill(df, x, y, z). now believe disadvantages approach outweigh benefits, interface better fill(df, c(x, y, z)). new functions select columns, please just use single argument ....","code":"nest_egg <- function(df, cols) { nest(df, egg = {{ cols }}) } nest_egg(mini_iris, !Species) #> # A tibble: 3 × 2 #> Species egg #> #> 1 setosa #> 2 versicolor #> 3 virginica nest_egg <- function(df, cols) { nest(df, egg = all_of(cols)) } vars <- c(\"Sepal.Length\", \"Sepal.Width\", \"Petal.Length\", \"Petal.Width\") nest_egg(mini_iris, vars) #> # A tibble: 3 × 2 #> Species egg #> #> 1 setosa #> 2 versicolor #> 3 virginica sel_vars <- function(df, cols) { tidyselect::eval_select(rlang::enquo(cols), df) } sel_vars(mini_iris, !Species) #> Sepal.Length Sepal.Width Petal.Length Petal.Width #> 1 2 3 4"},{"path":"https://tidyr.tidyverse.org/dev/articles/rectangle.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Rectangling","text":"Rectangling art craft taking deeply nested list (often sourced wild caught JSON XML) taming tidy data set rows columns. three functions tidyr particularly useful rectangling: unnest_longer() takes element list-column makes new row. unnest_wider() takes element list-column makes new column. hoist() similar unnest_wider() plucks selected components, can reach multiple levels. (Alternative, complex inputs need rectangle nested list according specification, see tibblify package.) large number data rectangling problems can solved combining jsonlite::read_json() functions splash dplyr (largely eliminating prior approaches combined mutate() multiple purrr::map()s). Note jsonlite another important function called fromJSON(). don’t recommend performs automatic simplification (simplifyVector = TRUE). often works well, particularly simple cases, think ’re better rectangling know exactly ’s happening can easily handle complicated nested structures. illustrate techniques, ’ll use repurrrsive package, provides number deeply nested lists originally mostly captured web APIs.","code":"library(tidyr) library(dplyr) library(repurrrsive)"},{"path":"https://tidyr.tidyverse.org/dev/articles/rectangle.html","id":"github-users","dir":"Articles","previous_headings":"","what":"GitHub users","title":"Rectangling","text":"’ll start gh_users, list contains information six GitHub users. begin, put gh_users list data frame: seems bit counter-intuitive: first step making list simpler make complicated? data frame big advantage: bundles together multiple vectors everything tracked together single object. user named list, element represents column. two ways turn list components columns. unnest_wider() takes every component makes new column: case, many components don’t need can instead use hoist(). hoist() allows us pull selected components using syntax purrr::pluck(): hoist() removes named components user list-column, can think moving components inner list top-level data frame.","code":"users <- tibble(user = gh_users) names(users$user[[1]]) #> [1] \"login\" \"id\" \"avatar_url\" #> [4] \"gravatar_id\" \"url\" \"html_url\" #> [7] \"followers_url\" \"following_url\" \"gists_url\" #> [10] \"starred_url\" \"subscriptions_url\" \"organizations_url\" #> [13] \"repos_url\" \"events_url\" \"received_events_url\" #> [16] \"type\" \"site_admin\" \"name\" #> [19] \"company\" \"blog\" \"location\" #> [22] \"email\" \"hireable\" \"bio\" #> [25] \"public_repos\" \"public_gists\" \"followers\" #> [28] \"following\" \"created_at\" \"updated_at\" users %>% unnest_wider(user) #> # A tibble: 6 × 30 #> login id avatar_url gravatar_id url html_url followers_url #> #> 1 gaborcsardi 660288 https://a… \"\" http… https:/… https://api.… #> 2 jennybc 599454 https://a… \"\" http… https:/… https://api.… #> 3 jtleek 1571674 https://a… \"\" http… https:/… https://api.… #> 4 juliasilge 12505835 https://a… \"\" http… https:/… https://api.… #> 5 leeper 3505428 https://a… \"\" http… https:/… https://api.… #> 6 masalmon 8360597 https://a… \"\" http… https:/… https://api.… #> # ℹ 23 more variables: following_url , gists_url , #> # starred_url , subscriptions_url , organizations_url , #> # repos_url , events_url , received_events_url , #> # type , site_admin , name , company , blog , #> # location , email , hireable , bio , #> # public_repos , public_gists , followers , #> # following , created_at , updated_at users %>% hoist(user, followers = \"followers\", login = \"login\", url = \"html_url\" ) #> # A tibble: 6 × 4 #> followers login url user #> #> 1 303 gaborcsardi https://github.com/gaborcsardi #> 2 780 jennybc https://github.com/jennybc #> 3 3958 jtleek https://github.com/jtleek #> 4 115 juliasilge https://github.com/juliasilge #> 5 213 leeper https://github.com/leeper #> 6 34 masalmon https://github.com/masalmon "},{"path":"https://tidyr.tidyverse.org/dev/articles/rectangle.html","id":"github-repos","dir":"Articles","previous_headings":"","what":"GitHub repos","title":"Rectangling","text":"start gh_repos similarly, putting tibble: time elements repos list repositories belong user. observations, become new rows, use unnest_longer() rather unnest_wider(): can use unnest_wider() hoist(): Note use c(\"owner\", \"login\"): allows us reach two levels deep inside list. alternative approach pull just owner put element column:","code":"repos <- tibble(repo = gh_repos) repos #> # A tibble: 6 × 1 #> repo #> #> 1 #> 2 #> 3 #> 4 #> 5 #> 6 repos <- repos %>% unnest_longer(repo) repos #> # A tibble: 176 × 1 #> repo #> #> 1 #> 2 #> 3 #> 4 #> 5 #> 6 #> 7 #> 8 #> 9 #> 10 #> # ℹ 166 more rows repos %>% hoist(repo, login = c(\"owner\", \"login\"), name = \"name\", homepage = \"homepage\", watchers = \"watchers_count\" ) #> # A tibble: 176 × 5 #> login name homepage watchers repo #> #> 1 gaborcsardi after NA 5 #> 2 gaborcsardi argufy NA 19 #> 3 gaborcsardi ask NA 5 #> 4 gaborcsardi baseimports NA 0 #> 5 gaborcsardi citest NA 0 #> 6 gaborcsardi clisymbols \"\" 18 #> 7 gaborcsardi cmaker NA 0 #> 8 gaborcsardi cmark NA 0 #> 9 gaborcsardi conditions NA 0 #> 10 gaborcsardi crayon NA 52 #> # ℹ 166 more rows repos %>% hoist(repo, owner = \"owner\") %>% unnest_wider(owner) #> # A tibble: 176 × 18 #> login id avatar_url gravatar_id url html_url followers_url #> #> 1 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 2 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 3 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 4 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 5 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 6 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 7 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 8 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 9 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 10 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> # ℹ 166 more rows #> # ℹ 11 more variables: following_url , gists_url , #> # starred_url , subscriptions_url , organizations_url , #> # repos_url , events_url , received_events_url , #> # type , site_admin , repo "},{"path":"https://tidyr.tidyverse.org/dev/articles/rectangle.html","id":"game-of-thrones-characters","dir":"Articles","previous_headings":"","what":"Game of Thrones characters","title":"Rectangling","text":"got_chars similar structure gh_users: ’s list named lists, element inner list describes attribute GoT character. start way, first creating data frame unnesting component column: complex gh_users component char list, giving us collection list-columns: next depend purposes analysis. Maybe want row every book TV series character appears : maybe want build table lets match title name: (Note empty titles (\"\") due infelicity input got_chars: ideally people without titles title vector length 0, title vector length 1 containing empty string.)","code":"chars <- tibble(char = got_chars) chars #> # A tibble: 30 × 1 #> char #> #> 1 #> 2 #> 3 #> 4 #> 5 #> 6 #> 7 #> 8 #> 9 #> 10 #> # ℹ 20 more rows chars2 <- chars %>% unnest_wider(char) chars2 #> # A tibble: 30 × 18 #> url id name gender culture born died alive titles aliases #> #> 1 https://ww… 1022 Theo… Male \"Ironb… \"In … \"\" TRUE #> 2 https://ww… 1052 Tyri… Male \"\" \"In … \"\" TRUE #> 3 https://ww… 1074 Vict… Male \"Ironb… \"In … \"\" TRUE #> 4 https://ww… 1109 Will Male \"\" \"\" \"In … FALSE #> 5 https://ww… 1166 Areo… Male \"Norvo… \"In … \"\" TRUE #> 6 https://ww… 1267 Chett Male \"\" \"At … \"In … FALSE #> 7 https://ww… 1295 Cres… Male \"\" \"In … \"In … FALSE #> 8 https://ww… 130 Aria… Female \"Dorni… \"In … \"\" TRUE #> 9 https://ww… 1303 Daen… Female \"Valyr… \"In … \"\" TRUE #> 10 https://ww… 1319 Davo… Male \"Weste… \"In … \"\" TRUE #> # ℹ 20 more rows #> # ℹ 8 more variables: father , mother , spouse , #> # allegiances , books , povBooks , tvSeries , #> # playedBy chars2 %>% select_if(is.list) #> # A tibble: 30 × 7 #> titles aliases allegiances books povBooks tvSeries playedBy #> #> 1 #> 2 #> 3 #> 4 #> 5 #> 6 #> 7 #> 8 #> 9 #> 10 #> # ℹ 20 more rows chars2 %>% select(name, books, tvSeries) %>% pivot_longer(c(books, tvSeries), names_to = \"media\", values_to = \"value\") %>% unnest_longer(value) #> # A tibble: 179 × 3 #> name media value #> #> 1 Theon Greyjoy books A Game of Thrones #> 2 Theon Greyjoy books A Storm of Swords #> 3 Theon Greyjoy books A Feast for Crows #> 4 Theon Greyjoy tvSeries Season 1 #> 5 Theon Greyjoy tvSeries Season 2 #> 6 Theon Greyjoy tvSeries Season 3 #> 7 Theon Greyjoy tvSeries Season 4 #> 8 Theon Greyjoy tvSeries Season 5 #> 9 Theon Greyjoy tvSeries Season 6 #> 10 Tyrion Lannister books A Feast for Crows #> # ℹ 169 more rows chars2 %>% select(name, title = titles) %>% unnest_longer(title) #> # A tibble: 59 × 2 #> name title #> #> 1 Theon Greyjoy \"Prince of Winterfell\" #> 2 Theon Greyjoy \"Lord of the Iron Islands (by law of the green lands… #> 3 Tyrion Lannister \"Acting Hand of the King (former)\" #> 4 Tyrion Lannister \"Master of Coin (former)\" #> 5 Victarion Greyjoy \"Lord Captain of the Iron Fleet\" #> 6 Victarion Greyjoy \"Master of the Iron Victory\" #> 7 Will \"\" #> 8 Areo Hotah \"Captain of the Guard at Sunspear\" #> 9 Chett \"\" #> 10 Cressen \"Maester\" #> # ℹ 49 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/rectangle.html","id":"geocoding-with-google","dir":"Articles","previous_headings":"","what":"Geocoding with google","title":"Rectangling","text":"Next ’ll tackle complex form data comes Google’s geocoding service, stored repurssive package json list-column named lists, makes sense start unnest_wider(): Notice results list lists. cities 1 element (representing unique match geocoding API), Washington Arlington two. can pull separate rows unnest_longer(): Now components, revealed unnest_wider(): can find latitude longitude unnesting geometry: location: also just look first address city: use hoist() dive deeply get directly lat lng:","code":"repurrrsive::gmaps_cities #> # A tibble: 5 × 2 #> city json #> #> 1 Houston #> 2 Washington #> 3 New York #> 4 Chicago #> 5 Arlington repurrrsive::gmaps_cities %>% unnest_wider(json) #> # A tibble: 5 × 3 #> city results status #> #> 1 Houston OK #> 2 Washington OK #> 3 New York OK #> 4 Chicago OK #> 5 Arlington OK repurrrsive::gmaps_cities %>% unnest_wider(json) %>% unnest_longer(results) #> # A tibble: 7 × 3 #> city results status #> #> 1 Houston OK #> 2 Washington OK #> 3 Washington OK #> 4 New York OK #> 5 Chicago OK #> 6 Arlington OK #> 7 Arlington OK repurrrsive::gmaps_cities %>% unnest_wider(json) %>% unnest_longer(results) %>% unnest_wider(results) #> # A tibble: 7 × 7 #> city address_components formatted_address geometry place_id types #> #> 1 Houst… Houston, TX, USA ChIJAYW… #> 2 Washi… Washington, USA ChIJ-bD… #> 3 Washi… Washington, DC, … ChIJW-T… #> 4 New Y… New York, NY, USA ChIJOwg… #> 5 Chica… Chicago, IL, USA ChIJ7cv… #> 6 Arlin… Arlington, TX, U… ChIJ05g… #> 7 Arlin… Arlington, VA, U… ChIJD6e… #> # ℹ 1 more variable: status repurrrsive::gmaps_cities %>% unnest_wider(json) %>% unnest_longer(results) %>% unnest_wider(results) %>% unnest_wider(geometry) #> # A tibble: 7 × 10 #> city address_components formatted_address bounds location #> #> 1 Houston Houston, TX, USA #> 2 Washingt… Washington, USA #> 3 Washingt… Washington, DC, … #> 4 New York New York, NY, USA #> 5 Chicago Chicago, IL, USA #> 6 Arlington Arlington, TX, U… #> 7 Arlington Arlington, VA, U… #> # ℹ 5 more variables: location_type , viewport , #> # place_id , types , status repurrrsive::gmaps_cities %>% unnest_wider(json) %>% unnest_longer(results) %>% unnest_wider(results) %>% unnest_wider(geometry) %>% unnest_wider(location) #> # A tibble: 7 × 11 #> city address_components formatted_address bounds lat lng #> #> 1 Houston Houston, TX, USA 29.8 -95.4 #> 2 Washingt… Washington, USA 47.8 -121. #> 3 Washingt… Washington, DC, … 38.9 -77.0 #> 4 New York New York, NY, USA 40.7 -74.0 #> 5 Chicago Chicago, IL, USA 41.9 -87.6 #> 6 Arlington Arlington, TX, U… 32.7 -97.1 #> 7 Arlington Arlington, VA, U… 38.9 -77.1 #> # ℹ 5 more variables: location_type , viewport , #> # place_id , types , status repurrrsive::gmaps_cities %>% unnest_wider(json) %>% hoist(results, first_result = 1) %>% unnest_wider(first_result) %>% unnest_wider(geometry) %>% unnest_wider(location) #> # A tibble: 5 × 12 #> city address_components formatted_address bounds lat lng #> #> 1 Houston Houston, TX, USA 29.8 -95.4 #> 2 Washingt… Washington, USA 47.8 -121. #> 3 New York New York, NY, USA 40.7 -74.0 #> 4 Chicago Chicago, IL, USA 41.9 -87.6 #> 5 Arlington Arlington, TX, U… 32.7 -97.1 #> # ℹ 6 more variables: location_type , viewport , #> # place_id , types , results , status repurrrsive::gmaps_cities %>% hoist(json, lat = list(\"results\", 1, \"geometry\", \"location\", \"lat\"), lng = list(\"results\", 1, \"geometry\", \"location\", \"lng\") ) #> # A tibble: 5 × 4 #> city lat lng json #> #> 1 Houston 29.8 -95.4 #> 2 Washington 47.8 -121. #> 3 New York 40.7 -74.0 #> 4 Chicago 41.9 -87.6 #> 5 Arlington 32.7 -97.1 "},{"path":"https://tidyr.tidyverse.org/dev/articles/rectangle.html","id":"sharla-gelfands-discography","dir":"Articles","previous_headings":"","what":"Sharla Gelfand’s discography","title":"Rectangling","text":"’ll finish complex list, Sharla Gelfand’s discography. ’ll start usual way: putting list single column data frame, widening component column. also parse date_added column real date-time1. level, see information disc added Sharla’s discography, information disc . need widen basic_information column: Unfortunately fails ’s id column inside basic_information. can quickly see ’s going setting names_repair = \"unique\": problem basic_information repeats id column ’s also stored top-level, can just drop : Alternatively, use hoist(): quickly extract name first label artist indexing deeply nested list. systematic approach create separate tables artist label: join back original dataset needed.","code":"discs <- tibble(disc = discog) %>% unnest_wider(disc) %>% mutate(date_added = as.POSIXct(strptime(date_added, \"%Y-%m-%dT%H:%M:%S\"))) discs #> # A tibble: 155 × 5 #> instance_id date_added basic_information id rating #> #> 1 354823933 2019-02-16 17:48:59 7496378 0 #> 2 354092601 2019-02-13 14:13:11 4490852 0 #> 3 354091476 2019-02-13 14:07:23 9827276 0 #> 4 351244906 2019-02-02 11:39:58 9769203 0 #> 5 351244801 2019-02-02 11:39:37 7237138 0 #> 6 351052065 2019-02-01 20:40:53 13117042 0 #> 7 350315345 2019-01-29 15:48:37 7113575 0 #> 8 350315103 2019-01-29 15:47:22 10540713 0 #> 9 350314507 2019-01-29 15:44:08 11260950 0 #> 10 350314047 2019-01-29 15:41:35 11726853 0 #> # ℹ 145 more rows discs %>% unnest_wider(basic_information) #> Error in `unnest_wider()`: #> ! Can't duplicate names between the affected columns and the #> original data. #> ✖ These names are duplicated: #> ℹ `id`, from `basic_information`. #> ℹ Use `names_sep` to disambiguate using the column name. #> ℹ Or use `names_repair` to specify a repair strategy. discs %>% unnest_wider(basic_information, names_repair = \"unique\") #> New names: #> • `id` -> `id...7` #> • `id` -> `id...14` #> # A tibble: 155 × 15 #> instance_id date_added labels year master_url artists id...7 #> #> 1 354823933 2019-02-16 17:48:59 2015 NA 7.50e6 #> 2 354092601 2019-02-13 14:13:11 2013 https://ap… 4.49e6 #> 3 354091476 2019-02-13 14:07:23 2017 https://ap… 9.83e6 #> 4 351244906 2019-02-02 11:39:58 2017 https://ap… 9.77e6 #> 5 351244801 2019-02-02 11:39:37 2015 https://ap… 7.24e6 #> 6 351052065 2019-02-01 20:40:53 2019 https://ap… 1.31e7 #> 7 350315345 2019-01-29 15:48:37 2014 https://ap… 7.11e6 #> 8 350315103 2019-01-29 15:47:22 2015 https://ap… 1.05e7 #> 9 350314507 2019-01-29 15:44:08 2017 https://ap… 1.13e7 #> 10 350314047 2019-01-29 15:41:35 2017 NA 1.17e7 #> # ℹ 145 more rows #> # ℹ 8 more variables: thumb , title , formats , #> # cover_image , resource_url , master_id , #> # id...14 , rating discs %>% select(!id) %>% unnest_wider(basic_information) #> # A tibble: 155 × 14 #> instance_id date_added labels year master_url artists id #> #> 1 354823933 2019-02-16 17:48:59 2015 NA 7.50e6 #> 2 354092601 2019-02-13 14:13:11 2013 https://ap… 4.49e6 #> 3 354091476 2019-02-13 14:07:23 2017 https://ap… 9.83e6 #> 4 351244906 2019-02-02 11:39:58 2017 https://ap… 9.77e6 #> 5 351244801 2019-02-02 11:39:37 2015 https://ap… 7.24e6 #> 6 351052065 2019-02-01 20:40:53 2019 https://ap… 1.31e7 #> 7 350315345 2019-01-29 15:48:37 2014 https://ap… 7.11e6 #> 8 350315103 2019-01-29 15:47:22 2015 https://ap… 1.05e7 #> 9 350314507 2019-01-29 15:44:08 2017 https://ap… 1.13e7 #> 10 350314047 2019-01-29 15:41:35 2017 NA 1.17e7 #> # ℹ 145 more rows #> # ℹ 7 more variables: thumb , title , formats , #> # cover_image , resource_url , master_id , rating discs %>% hoist(basic_information, title = \"title\", year = \"year\", label = list(\"labels\", 1, \"name\"), artist = list(\"artists\", 1, \"name\") ) #> # A tibble: 155 × 9 #> instance_id date_added title year label artist #> #> 1 354823933 2019-02-16 17:48:59 Demo 2015 Tobi… Mollot #> 2 354092601 2019-02-13 14:13:11 Observant Com El Mo… 2013 La V… Una B… #> 3 354091476 2019-02-13 14:07:23 I 2017 La V… S.H.I… #> 4 351244906 2019-02-02 11:39:58 Oído Absoluto 2017 La V… Rata … #> 5 351244801 2019-02-02 11:39:37 A Cat's Cause, No D… 2015 Kato… Ivy (… #> 6 351052065 2019-02-01 20:40:53 Tashme 2019 High… Tashme #> 7 350315345 2019-01-29 15:48:37 Demo 2014 Mind… Desgr… #> 8 350315103 2019-01-29 15:47:22 Let The Miracles Be… 2015 Not … Phant… #> 9 350314507 2019-01-29 15:44:08 Sub Space 2017 Not … Sub S… #> 10 350314047 2019-01-29 15:41:35 Demo 2017 Pres… Small… #> # ℹ 145 more rows #> # ℹ 3 more variables: basic_information , id , rating discs %>% hoist(basic_information, artist = \"artists\") %>% select(disc_id = id, artist) %>% unnest_longer(artist) %>% unnest_wider(artist) #> # A tibble: 167 × 8 #> disc_id join name anv tracks role resource_url id #> #> 1 7496378 \"\" Mollot \"\" \"\" \"\" https://api… 4.62e6 #> 2 4490852 \"\" Una Bèstia Incon… \"\" \"\" \"\" https://api… 3.19e6 #> 3 9827276 \"\" S.H.I.T. (3) \"\" \"\" \"\" https://api… 2.77e6 #> 4 9769203 \"\" Rata Negra \"\" \"\" \"\" https://api… 4.28e6 #> 5 7237138 \"\" Ivy (18) \"\" \"\" \"\" https://api… 3.60e6 #> 6 13117042 \"\" Tashme \"\" \"\" \"\" https://api… 5.21e6 #> 7 7113575 \"\" Desgraciados \"\" \"\" \"\" https://api… 4.45e6 #> 8 10540713 \"\" Phantom Head \"\" \"\" \"\" https://api… 4.27e6 #> 9 11260950 \"\" Sub Space (2) \"\" \"\" \"\" https://api… 5.69e6 #> 10 11726853 \"\" Small Man (2) \"\" \"\" \"\" https://api… 6.37e6 #> # ℹ 157 more rows discs %>% hoist(basic_information, format = \"formats\") %>% select(disc_id = id, format) %>% unnest_longer(format) %>% unnest_wider(format) %>% unnest_longer(descriptions) #> # A tibble: 258 × 5 #> disc_id descriptions text name qty #> #> 1 7496378 \"Numbered\" Black Cassette 1 #> 2 4490852 \"LP\" NA Vinyl 1 #> 3 9827276 \"7\\\"\" NA Vinyl 1 #> 4 9827276 \"45 RPM\" NA Vinyl 1 #> 5 9827276 \"EP\" NA Vinyl 1 #> 6 9769203 \"LP\" NA Vinyl 1 #> 7 9769203 \"Album\" NA Vinyl 1 #> 8 7237138 \"7\\\"\" NA Vinyl 1 #> 9 7237138 \"45 RPM\" NA Vinyl 1 #> 10 13117042 \"7\\\"\" NA Vinyl 1 #> # ℹ 248 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"data-tidying","dir":"Articles","previous_headings":"","what":"Data tidying","title":"Tidy data","text":"often said 80% data analysis spent cleaning preparing data. ’s just first step, must repeated many times course analysis new problems come light new data collected. get handle problem, paper focuses small, important, aspect data cleaning call data tidying: structuring datasets facilitate analysis. principles tidy data provide standard way organise data values within dataset. standard makes initial data cleaning easier don’t need start scratch reinvent wheel every time. tidy data standard designed facilitate initial exploration analysis data, simplify development data analysis tools work well together. Current tools often require translation. spend time munging output one tool can input another. Tidy datasets tidy tools work hand hand make data analysis easier, allowing focus interesting domain problem, uninteresting logistics data.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"defining","dir":"Articles","previous_headings":"","what":"Defining tidy data","title":"Tidy data","text":"Happy families alike; every unhappy family unhappy way — Leo Tolstoy Like families, tidy datasets alike every messy dataset messy way. Tidy datasets provide standardized way link structure dataset (physical layout) semantics (meaning). section, ’ll provide standard vocabulary describing structure semantics dataset, use definitions define tidy data.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"data-structure","dir":"Articles","previous_headings":"Defining tidy data","what":"Data structure","title":"Tidy data","text":"statistical datasets data frames made rows columns. columns almost always labeled rows sometimes labeled. following code provides data imaginary classroom format commonly seen wild. table three columns four rows, rows columns labeled. many ways structure underlying data. following table shows data , rows columns transposed. data , layout different. vocabulary rows columns simply rich enough describe two tables represent data. addition appearance, need way describe underlying semantics, meaning, values displayed table.","code":"library(tibble) classroom <- tribble( ~name, ~quiz1, ~quiz2, ~test1, \"Billy\", NA, \"D\", \"C\", \"Suzy\", \"F\", NA, NA, \"Lionel\", \"B\", \"C\", \"B\", \"Jenny\", \"A\", \"A\", \"B\" ) classroom #> # A tibble: 4 × 4 #> name quiz1 quiz2 test1 #> #> 1 Billy NA D C #> 2 Suzy F NA NA #> 3 Lionel B C B #> 4 Jenny A A B tribble( ~assessment, ~Billy, ~Suzy, ~Lionel, ~Jenny, \"quiz1\", NA, \"F\", \"B\", \"A\", \"quiz2\", \"D\", NA, \"C\", \"A\", \"test1\", \"C\", NA, \"B\", \"B\" ) #> # A tibble: 3 × 5 #> assessment Billy Suzy Lionel Jenny #> #> 1 quiz1 NA F B A #> 2 quiz2 D NA C A #> 3 test1 C NA B B"},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"data-semantics","dir":"Articles","previous_headings":"Defining tidy data","what":"Data semantics","title":"Tidy data","text":"dataset collection values, usually either numbers (quantitative) strings (qualitative). Values organised two ways. Every value belongs variable observation. variable contains values measure underlying attribute (like height, temperature, duration) across units. observation contains values measured unit (like person, day, race) across attributes. tidy version classroom data looks like : (’ll learn functions work little later) makes values, variables, observations clear. dataset contains 36 values representing three variables 12 observations. variables : name, four possible values (Billy, Suzy, Lionel, Jenny). assessment, three possible values (quiz1, quiz2, test1). grade, five six values depending think missing value (, B, C, D, F, NA). tidy data frame explicitly tells us definition observation. classroom, every combination name assessment single measured observation. dataset also informs us missing values, can meaning. Billy absent first quiz, tried salvage grade. Suzy failed first quiz, decided drop class. calculate Billy’s final grade, might replace missing value F (might get second chance take quiz). However, want know class average Test 1, dropping Suzy’s structural missing value appropriate imputing new value. given dataset, ’s usually easy figure observations variables, surprisingly difficult precisely define variables observations general. example, columns classroom data height weight happy call variables. columns height width, less clear cut, might think height width values dimension variable. columns home phone work phone, treat two variables, fraud detection environment might want variables phone number number type use one phone number multiple people might suggest fraud. general rule thumb easier describe functional relationships variables (e.g., z linear combination x y, density ratio weight volume) rows, easier make comparisons groups observations (e.g., average group vs. average group b) groups columns. given analysis, may multiple levels observation. example, trial new allergy medication might three observational types: demographic data collected person (age, sex, race), medical data collected person day (number sneezes, redness eyes), meteorological data collected day (temperature, pollen count). Variables may change course analysis. Often variables raw data fine grained, may add extra modelling complexity little explanatory gain. example, many surveys ask variations question better get underlying trait. early stages analysis, variables correspond questions. later stages, change focus traits, computed averaging together multiple questions. considerably simplifies analysis don’t need hierarchical model, can often pretend data continuous, discrete.","code":"library(tidyr) library(dplyr) classroom2 <- classroom %>% pivot_longer(quiz1:test1, names_to = \"assessment\", values_to = \"grade\") %>% arrange(name, assessment) classroom2 #> # A tibble: 12 × 3 #> name assessment grade #> #> 1 Billy quiz1 NA #> 2 Billy quiz2 D #> 3 Billy test1 C #> 4 Jenny quiz1 A #> 5 Jenny quiz2 A #> 6 Jenny test1 B #> 7 Lionel quiz1 B #> 8 Lionel quiz2 C #> 9 Lionel test1 B #> 10 Suzy quiz1 F #> # ℹ 2 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"tidy-data","dir":"Articles","previous_headings":"Defining tidy data","what":"Tidy data","title":"Tidy data","text":"Tidy data standard way mapping meaning dataset structure. dataset messy tidy depending rows, columns tables matched observations, variables types. tidy data: variable column; column variable. observation row; row observation. value cell; cell single value. Codd’s 3rd normal form, constraints framed statistical language, focus put single dataset rather many connected datasets common relational databases. Messy data arrangement data. Tidy data makes easy analyst computer extract needed variables provides standard way structuring dataset. Compare different versions classroom data: messy version need use different strategies extract different variables. slows analysis invites errors. consider many data analysis operations involve values variable (every aggregation function), can see important extract values simple, standard way. Tidy data particularly well suited vectorised programming languages like R, layout ensures values different variables observation always paired. order variables observations affect analysis, good ordering makes easier scan raw values. One way organising variables role analysis: values fixed design data collection, measured course experiment? Fixed variables describe experimental design known advance. Computer scientists often call fixed variables dimensions, statisticians usually denote subscripts random variables. Measured variables actually measure study. Fixed variables come first, followed measured variables, ordered related variables contiguous. Rows can ordered first variable, breaking ties second subsequent (fixed) variables. convention adopted tabular displays paper.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"tidying","dir":"Articles","previous_headings":"","what":"Tidying messy datasets","title":"Tidy data","text":"Real datasets can, often , violate three precepts tidy data almost every way imaginable. occasionally get dataset can start analysing immediately, exception, rule. section describes five common problems messy datasets, along remedies: Column headers values, variable names. Multiple variables stored one column. Variables stored rows columns. Multiple types observational units stored table. single observational unit stored multiple tables. Surprisingly, messy datasets, including types messiness explicitly described , can tidied small set tools: pivoting (longer wider) separating. following sections illustrate problem real dataset encountered, show tidy .","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"column-headers-are-values-not-variable-names","dir":"Articles","previous_headings":"Tidying messy datasets","what":"Column headers are values, not variable names","title":"Tidy data","text":"common type messy dataset tabular data designed presentation, variables form rows columns, column headers values, variable names. call arrangement messy, cases can extremely useful. provides efficient storage completely crossed designs, can lead extremely efficient computation desired operations can expressed matrix operations. following code shows subset typical dataset form. dataset explores relationship income religion US. comes report produced Pew Research Center, American think-tank collects data attitudes topics ranging religion internet, produces many reports contain datasets format. dataset three variables, religion, income frequency. tidy , need pivot non-variable columns two-column key-value pair. action often described making wide dataset longer (taller). pivoting variables, need provide name new key-value columns create. defining columns pivot (every column except religion), need name key column, name variable defined values column headings. case, ’s income. second argument name value column, frequency. form tidy column represents variable row represents observation, case demographic unit corresponding combination religion income. format also used record regularly spaced observations time. example, Billboard dataset shown records date song first entered billboard top 100. variables artist, track, date.entered, rank week. rank week enters top 100 recorded 75 columns, wk1 wk75. form storage tidy, useful data entry. reduces duplication since otherwise song week need row, song metadata like title artist need repeated. discussed depth multiple types. tidy dataset, first use pivot_longer() make dataset longer. transform columns wk1 wk76, making new column names, week, new value values, rank: use values_drop_na = TRUE drop missing values rank column. data, missing values represent weeks song wasn’t charts, can safely dropped. case ’s also nice little cleaning, converting week variable number, figuring date corresponding week charts: Finally, ’s always good idea sort data. artist, track week: date rank:","code":"relig_income #> # A tibble: 18 × 11 #> religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` #> #> 1 Agnostic 27 34 60 81 76 137 #> 2 Atheist 12 27 37 52 35 70 #> 3 Buddhist 27 21 30 34 33 58 #> 4 Catholic 418 617 732 670 638 1116 #> 5 Don’t know/r… 15 14 15 11 10 35 #> 6 Evangelical … 575 869 1064 982 881 1486 #> 7 Hindu 1 9 7 9 11 34 #> 8 Historically… 228 244 236 238 197 223 #> 9 Jehovah's Wi… 20 27 24 24 21 30 #> 10 Jewish 19 19 25 25 30 95 #> # ℹ 8 more rows #> # ℹ 4 more variables: `$75-100k` , `$100-150k` , `>150k` , #> # `Don't know/refused` relig_income %>% pivot_longer(-religion, names_to = \"income\", values_to = \"frequency\") #> # A tibble: 180 × 3 #> religion income frequency #> #> 1 Agnostic <$10k 27 #> 2 Agnostic $10-20k 34 #> 3 Agnostic $20-30k 60 #> 4 Agnostic $30-40k 81 #> 5 Agnostic $40-50k 76 #> 6 Agnostic $50-75k 137 #> 7 Agnostic $75-100k 122 #> 8 Agnostic $100-150k 109 #> 9 Agnostic >150k 84 #> 10 Agnostic Don't know/refused 96 #> # ℹ 170 more rows billboard #> # A tibble: 317 × 79 #> artist track date.entered wk1 wk2 wk3 wk4 wk5 wk6 wk7 #> #> 1 2 Pac Baby… 2000-02-26 87 82 72 77 87 94 99 #> 2 2Ge+her The … 2000-09-02 91 87 92 NA NA NA NA #> 3 3 Doors D… Kryp… 2000-04-08 81 70 68 67 66 57 54 #> 4 3 Doors D… Loser 2000-10-21 76 76 72 69 67 65 55 #> 5 504 Boyz Wobb… 2000-04-15 57 34 25 17 17 31 36 #> 6 98^0 Give… 2000-08-19 51 39 34 26 26 19 2 #> 7 A*Teens Danc… 2000-07-08 97 97 96 95 100 NA NA #> 8 Aaliyah I Do… 2000-01-29 84 62 51 41 38 35 35 #> 9 Aaliyah Try … 2000-03-18 59 53 38 28 21 18 16 #> 10 Adams, Yo… Open… 2000-08-26 76 76 74 69 68 67 61 #> # ℹ 307 more rows #> # ℹ 69 more variables: wk8 , wk9 , wk10 , wk11 , #> # wk12 , wk13 , wk14 , wk15 , wk16 , #> # wk17 , wk18 , wk19 , wk20 , wk21 , #> # wk22 , wk23 , wk24 , wk25 , wk26 , #> # wk27 , wk28 , wk29 , wk30 , wk31 , #> # wk32 , wk33 , wk34 , wk35 , wk36 , … billboard2 <- billboard %>% pivot_longer( wk1:wk76, names_to = \"week\", values_to = \"rank\", values_drop_na = TRUE ) billboard2 #> # A tibble: 5,307 × 5 #> artist track date.entered week rank #> #> 1 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk1 87 #> 2 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk2 82 #> 3 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk3 72 #> 4 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk4 77 #> 5 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk5 87 #> 6 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk6 94 #> 7 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk7 99 #> 8 2Ge+her The Hardest Part Of ... 2000-09-02 wk1 91 #> 9 2Ge+her The Hardest Part Of ... 2000-09-02 wk2 87 #> 10 2Ge+her The Hardest Part Of ... 2000-09-02 wk3 92 #> # ℹ 5,297 more rows billboard3 <- billboard2 %>% mutate( week = as.integer(gsub(\"wk\", \"\", week)), date = as.Date(date.entered) + 7 * (week - 1), date.entered = NULL ) billboard3 #> # A tibble: 5,307 × 5 #> artist track week rank date #> #> 1 2 Pac Baby Don't Cry (Keep... 1 87 2000-02-26 #> 2 2 Pac Baby Don't Cry (Keep... 2 82 2000-03-04 #> 3 2 Pac Baby Don't Cry (Keep... 3 72 2000-03-11 #> 4 2 Pac Baby Don't Cry (Keep... 4 77 2000-03-18 #> 5 2 Pac Baby Don't Cry (Keep... 5 87 2000-03-25 #> 6 2 Pac Baby Don't Cry (Keep... 6 94 2000-04-01 #> 7 2 Pac Baby Don't Cry (Keep... 7 99 2000-04-08 #> 8 2Ge+her The Hardest Part Of ... 1 91 2000-09-02 #> 9 2Ge+her The Hardest Part Of ... 2 87 2000-09-09 #> 10 2Ge+her The Hardest Part Of ... 3 92 2000-09-16 #> # ℹ 5,297 more rows billboard3 %>% arrange(artist, track, week) #> # A tibble: 5,307 × 5 #> artist track week rank date #> #> 1 2 Pac Baby Don't Cry (Keep... 1 87 2000-02-26 #> 2 2 Pac Baby Don't Cry (Keep... 2 82 2000-03-04 #> 3 2 Pac Baby Don't Cry (Keep... 3 72 2000-03-11 #> 4 2 Pac Baby Don't Cry (Keep... 4 77 2000-03-18 #> 5 2 Pac Baby Don't Cry (Keep... 5 87 2000-03-25 #> 6 2 Pac Baby Don't Cry (Keep... 6 94 2000-04-01 #> 7 2 Pac Baby Don't Cry (Keep... 7 99 2000-04-08 #> 8 2Ge+her The Hardest Part Of ... 1 91 2000-09-02 #> 9 2Ge+her The Hardest Part Of ... 2 87 2000-09-09 #> 10 2Ge+her The Hardest Part Of ... 3 92 2000-09-16 #> # ℹ 5,297 more rows billboard3 %>% arrange(date, rank) #> # A tibble: 5,307 × 5 #> artist track week rank date #> #> 1 Lonestar Amazed 1 81 1999-06-05 #> 2 Lonestar Amazed 2 54 1999-06-12 #> 3 Lonestar Amazed 3 44 1999-06-19 #> 4 Lonestar Amazed 4 39 1999-06-26 #> 5 Lonestar Amazed 5 38 1999-07-03 #> 6 Lonestar Amazed 6 33 1999-07-10 #> 7 Lonestar Amazed 7 29 1999-07-17 #> 8 Amber Sexual 1 99 1999-07-17 #> 9 Lonestar Amazed 8 29 1999-07-24 #> 10 Amber Sexual 2 99 1999-07-24 #> # ℹ 5,297 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"multiple-variables-stored-in-one-column","dir":"Articles","previous_headings":"Tidying messy datasets","what":"Multiple variables stored in one column","title":"Tidy data","text":"pivoting columns, key column sometimes combination multiple underlying variable names. happens tb (tuberculosis) dataset, shown . dataset comes World Health Organisation, records counts confirmed tuberculosis cases country, year, demographic group. demographic groups broken sex (m, f) age (0-14, 15-25, 25-34, 35-44, 45-54, 55-64, unknown). First use pivot_longer() gather non-variable columns: Column headers format often separated non-alphanumeric character (e.g. ., -, _, :), fixed width format, like dataset. separate() makes easy split compound variables individual variables. can either pass regular expression split (default split non-alphanumeric columns), vector character positions. case want split first character: Storing values form resolves problem original data. want compare rates, counts, means need know population. original format, easy way add population variable. stored separate table, makes hard correctly match populations counts. tidy form, adding variables population rate easy ’re just additional columns. case, also transformation single step supplying multiple column names names_to also supplying grouped regular expression names_pattern:","code":"tb <- as_tibble(read.csv(\"tb.csv\", stringsAsFactors = FALSE)) tb #> # A tibble: 5,769 × 22 #> iso2 year m04 m514 m014 m1524 m2534 m3544 m4554 m5564 m65 mu #> #> 1 AD 1989 NA NA NA NA NA NA NA NA NA NA #> 2 AD 1990 NA NA NA NA NA NA NA NA NA NA #> 3 AD 1991 NA NA NA NA NA NA NA NA NA NA #> 4 AD 1992 NA NA NA NA NA NA NA NA NA NA #> 5 AD 1993 NA NA NA NA NA NA NA NA NA NA #> 6 AD 1994 NA NA NA NA NA NA NA NA NA NA #> 7 AD 1996 NA NA 0 0 0 4 1 0 0 NA #> 8 AD 1997 NA NA 0 0 1 2 2 1 6 NA #> 9 AD 1998 NA NA 0 0 0 1 0 0 0 NA #> 10 AD 1999 NA NA 0 0 0 1 1 0 0 NA #> # ℹ 5,759 more rows #> # ℹ 10 more variables: f04 , f514 , f014 , f1524 , #> # f2534 , f3544 , f4554 , f5564 , f65 , #> # fu tb2 <- tb %>% pivot_longer( !c(iso2, year), names_to = \"demo\", values_to = \"n\", values_drop_na = TRUE ) tb2 #> # A tibble: 35,750 × 4 #> iso2 year demo n #> #> 1 AD 1996 m014 0 #> 2 AD 1996 m1524 0 #> 3 AD 1996 m2534 0 #> 4 AD 1996 m3544 4 #> 5 AD 1996 m4554 1 #> 6 AD 1996 m5564 0 #> 7 AD 1996 m65 0 #> 8 AD 1996 f014 0 #> 9 AD 1996 f1524 1 #> 10 AD 1996 f2534 1 #> # ℹ 35,740 more rows tb3 <- tb2 %>% separate(demo, c(\"sex\", \"age\"), 1) tb3 #> # A tibble: 35,750 × 5 #> iso2 year sex age n #> #> 1 AD 1996 m 014 0 #> 2 AD 1996 m 1524 0 #> 3 AD 1996 m 2534 0 #> 4 AD 1996 m 3544 4 #> 5 AD 1996 m 4554 1 #> 6 AD 1996 m 5564 0 #> 7 AD 1996 m 65 0 #> 8 AD 1996 f 014 0 #> 9 AD 1996 f 1524 1 #> 10 AD 1996 f 2534 1 #> # ℹ 35,740 more rows tb %>% pivot_longer( !c(iso2, year), names_to = c(\"sex\", \"age\"), names_pattern = \"(.)(.+)\", values_to = \"n\", values_drop_na = TRUE ) #> # A tibble: 35,750 × 5 #> iso2 year sex age n #> #> 1 AD 1996 m 014 0 #> 2 AD 1996 m 1524 0 #> 3 AD 1996 m 2534 0 #> 4 AD 1996 m 3544 4 #> 5 AD 1996 m 4554 1 #> 6 AD 1996 m 5564 0 #> 7 AD 1996 m 65 0 #> 8 AD 1996 f 014 0 #> 9 AD 1996 f 1524 1 #> 10 AD 1996 f 2534 1 #> # ℹ 35,740 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"variables-are-stored-in-both-rows-and-columns","dir":"Articles","previous_headings":"Tidying messy datasets","what":"Variables are stored in both rows and columns","title":"Tidy data","text":"complicated form messy data occurs variables stored rows columns. code loads daily weather data Global Historical Climatology Network one weather station (MX17004) Mexico five months 2010. variables individual columns (id, year, month), spread across columns (day, d1-d31) across rows (tmin, tmax) (minimum maximum temperature). Months fewer 31 days structural missing values last day(s) month. tidy dataset first use pivot_longer gather day columns: presentation, ’ve dropped missing values, making implicit rather explicit. ok know many days month can easily reconstruct explicit missing values. ’ll also little cleaning: dataset mostly tidy, element column variable; stores names variables. (shown example meteorological variables prcp (precipitation) snow (snowfall)). Fixing requires widening data: pivot_wider() inverse pivot_longer(), pivoting element value back across multiple columns: form tidy: ’s one variable column, row represents one day.","code":"weather <- as_tibble(read.csv(\"weather.csv\", stringsAsFactors = FALSE)) weather #> # A tibble: 22 × 35 #> id year month element d1 d2 d3 d4 d5 d6 d7 #> #> 1 MX17004 2010 1 tmax NA NA NA NA NA NA NA #> 2 MX17004 2010 1 tmin NA NA NA NA NA NA NA #> 3 MX17004 2010 2 tmax NA 27.3 24.1 NA NA NA NA #> 4 MX17004 2010 2 tmin NA 14.4 14.4 NA NA NA NA #> 5 MX17004 2010 3 tmax NA NA NA NA 32.1 NA NA #> 6 MX17004 2010 3 tmin NA NA NA NA 14.2 NA NA #> 7 MX17004 2010 4 tmax NA NA NA NA NA NA NA #> 8 MX17004 2010 4 tmin NA NA NA NA NA NA NA #> 9 MX17004 2010 5 tmax NA NA NA NA NA NA NA #> 10 MX17004 2010 5 tmin NA NA NA NA NA NA NA #> # ℹ 12 more rows #> # ℹ 24 more variables: d8 , d9 , d10 , d11 , #> # d12 , d13 , d14 , d15 , d16 , d17 , #> # d18 , d19 , d20 , d21 , d22 , d23 , #> # d24 , d25 , d26 , d27 , d28 , d29 , #> # d30 , d31 weather2 <- weather %>% pivot_longer( d1:d31, names_to = \"day\", values_to = \"value\", values_drop_na = TRUE ) weather2 #> # A tibble: 66 × 6 #> id year month element day value #> #> 1 MX17004 2010 1 tmax d30 27.8 #> 2 MX17004 2010 1 tmin d30 14.5 #> 3 MX17004 2010 2 tmax d2 27.3 #> 4 MX17004 2010 2 tmax d3 24.1 #> 5 MX17004 2010 2 tmax d11 29.7 #> 6 MX17004 2010 2 tmax d23 29.9 #> 7 MX17004 2010 2 tmin d2 14.4 #> 8 MX17004 2010 2 tmin d3 14.4 #> 9 MX17004 2010 2 tmin d11 13.4 #> 10 MX17004 2010 2 tmin d23 10.7 #> # ℹ 56 more rows weather3 <- weather2 %>% mutate(day = as.integer(gsub(\"d\", \"\", day))) %>% select(id, year, month, day, element, value) weather3 #> # A tibble: 66 × 6 #> id year month day element value #> #> 1 MX17004 2010 1 30 tmax 27.8 #> 2 MX17004 2010 1 30 tmin 14.5 #> 3 MX17004 2010 2 2 tmax 27.3 #> 4 MX17004 2010 2 3 tmax 24.1 #> 5 MX17004 2010 2 11 tmax 29.7 #> 6 MX17004 2010 2 23 tmax 29.9 #> 7 MX17004 2010 2 2 tmin 14.4 #> 8 MX17004 2010 2 3 tmin 14.4 #> 9 MX17004 2010 2 11 tmin 13.4 #> 10 MX17004 2010 2 23 tmin 10.7 #> # ℹ 56 more rows weather3 %>% pivot_wider(names_from = element, values_from = value) #> # A tibble: 33 × 6 #> id year month day tmax tmin #> #> 1 MX17004 2010 1 30 27.8 14.5 #> 2 MX17004 2010 2 2 27.3 14.4 #> 3 MX17004 2010 2 3 24.1 14.4 #> 4 MX17004 2010 2 11 29.7 13.4 #> 5 MX17004 2010 2 23 29.9 10.7 #> 6 MX17004 2010 3 5 32.1 14.2 #> 7 MX17004 2010 3 10 34.5 16.8 #> 8 MX17004 2010 3 16 31.1 17.6 #> 9 MX17004 2010 4 27 36.3 16.7 #> 10 MX17004 2010 5 27 33.2 18.2 #> # ℹ 23 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"multiple-types","dir":"Articles","previous_headings":"Tidying messy datasets","what":"Multiple types in one table","title":"Tidy data","text":"Datasets often involve values collected multiple levels, different types observational units. tidying, type observational unit stored table. closely related idea database normalisation, fact expressed one place. ’s important otherwise inconsistencies can arise. billboard dataset actually contains observations two types observational units: song rank week. manifests duplication facts song: artist repeated many times. dataset needs broken two pieces: song dataset stores artist song name, ranking dataset gives rank song week. first extract song dataset: use make rank dataset replacing repeated song facts pointer song details (unique song id): also imagine week dataset record background information week, maybe total number songs sold similar “demographic” information. Normalisation useful tidying eliminating inconsistencies. However, data analysis tools work directly relational data, analysis usually also requires denormalisation merging datasets back one table.","code":"song <- billboard3 %>% distinct(artist, track) %>% mutate(song_id = row_number()) song #> # A tibble: 317 × 3 #> artist track song_id #> #> 1 2 Pac Baby Don't Cry (Keep... 1 #> 2 2Ge+her The Hardest Part Of ... 2 #> 3 3 Doors Down Kryptonite 3 #> 4 3 Doors Down Loser 4 #> 5 504 Boyz Wobble Wobble 5 #> 6 98^0 Give Me Just One Nig... 6 #> 7 A*Teens Dancing Queen 7 #> 8 Aaliyah I Don't Wanna 8 #> 9 Aaliyah Try Again 9 #> 10 Adams, Yolanda Open My Heart 10 #> # ℹ 307 more rows rank <- billboard3 %>% left_join(song, c(\"artist\", \"track\")) %>% select(song_id, date, week, rank) rank #> # A tibble: 5,307 × 4 #> song_id date week rank #> #> 1 1 2000-02-26 1 87 #> 2 1 2000-03-04 2 82 #> 3 1 2000-03-11 3 72 #> 4 1 2000-03-18 4 77 #> 5 1 2000-03-25 5 87 #> 6 1 2000-04-01 6 94 #> 7 1 2000-04-08 7 99 #> 8 2 2000-09-02 1 91 #> 9 2 2000-09-09 2 87 #> 10 2 2000-09-16 3 92 #> # ℹ 5,297 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"one-type-in-multiple-tables","dir":"Articles","previous_headings":"Tidying messy datasets","what":"One type in multiple tables","title":"Tidy data","text":"’s also common find data values single type observational unit spread multiple tables files. tables files often split another variable, represents single year, person, location. long format individual records consistent, easy problem fix: Read files list tables. table, add new column records original file name (file name often value important variable). Combine tables single table. Purrr makes straightforward R. following code generates vector file names directory (data/) match regular expression (ends .csv). Next name element vector name file. preserve names following step, ensuring row final data frame labeled source. Finally, map_dfr() loops path, reading csv file combining results single data frame. single table, can perform additional tidying needed. example type cleaning can found https://github.com/hadley/data-baby-names takes 129 yearly baby name tables provided US Social Security Administration combines single file. complicated situation occurs dataset structure changes time. example, datasets may contain different variables, variables different names, different file formats, different conventions missing values. may require tidy file individually (, ’re lucky, small groups) combine tidied. example type tidying illustrated https://github.com/hadley/data-fuel-economy, shows tidying epa fuel economy data 50,000 cars 1978 2008. raw data available online, year stored separate file four major formats many minor variations, making tidying dataset considerable challenge.","code":"library(purrr) paths <- dir(\"data\", pattern = \"\\\\.csv$\", full.names = TRUE) names(paths) <- basename(paths) map_dfr(paths, read.csv, stringsAsFactors = FALSE, .id = \"filename\")"},{"path":"https://tidyr.tidyverse.org/dev/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Hadley Wickham. Author, maintainer. Davis Vaughan. Author. Maximilian Girlich. Author. Kevin Ushey. Contributor. . Copyright holder, funder.","code":""},{"path":"https://tidyr.tidyverse.org/dev/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Wickham H, Vaughan D, Girlich M (2024). tidyr: Tidy Messy Data. R package version 1.3.1.9000, https://github.com/tidyverse/tidyr, https://tidyr.tidyverse.org.","code":"@Manual{, title = {tidyr: Tidy Messy Data}, author = {Hadley Wickham and Davis Vaughan and Maximilian Girlich}, year = {2024}, note = {R package version 1.3.1.9000, https://github.com/tidyverse/tidyr}, url = {https://tidyr.tidyverse.org}, }"},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/index.html","id":"overview","dir":"","previous_headings":"","what":"Overview","title":"Tidy Messy Data","text":"goal tidyr help create tidy data. Tidy data data : variable column; column variable. observation row; row observation. value cell; cell single value. Tidy data describes standard way storing data used wherever possible throughout tidyverse. ensure data tidy, ’ll spend less time fighting tools time working analysis. Learn tidy data vignette(\"tidy-data\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Tidy Messy Data","text":"","code":"# The easiest way to get tidyr is to install the whole tidyverse: install.packages(\"tidyverse\") # Alternatively, install just tidyr: install.packages(\"tidyr\") # Or the development version from GitHub: # install.packages(\"pak\") pak::pak(\"tidyverse/tidyr\")"},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/index.html","id":"getting-started","dir":"","previous_headings":"","what":"Getting started","title":"Tidy Messy Data","text":"tidyr functions fall five main categories: “Pivoting” converts long wide forms. tidyr 1.0.0 introduces pivot_longer() pivot_wider(), replacing older spread() gather() functions. See vignette(\"pivot\") details. “Rectangling”, turns deeply nested lists (JSON) tidy tibbles. See unnest_longer(), unnest_wider(), hoist(), vignette(\"rectangle\") details. Nesting converts grouped data form group becomes single row containing nested data frame, unnesting opposite. See nest(), unnest(), vignette(\"nest\") details. Splitting combining character columns. Use separate_wider_delim(), separate_wider_position(), separate_wider_regex() pull single character column multiple columns; use unite() combine multiple columns single character column. Make implicit missing values explicit complete(); make explicit missing values implicit drop_na(); replace missing values next/previous value fill(), known value replace_na().","code":"library(tidyr)"},{"path":"https://tidyr.tidyverse.org/dev/index.html","id":"related-work","dir":"","previous_headings":"","what":"Related work","title":"Tidy Messy Data","text":"tidyr supersedes reshape2 (2010-2014) reshape (2005-2010). Somewhat counterintuitively, iteration package done less. tidyr designed specifically tidying data, general reshaping (reshape2), general aggregation (reshape). data.table provides high-performance implementations melt() dcast() ’d like read data reshaping CS perspective, ’d recommend following three papers: Wrangler: Interactive visual specification data transformation scripts interactive framework data cleaning (Potter’s wheel) efficiently implementing SchemaSQL SQL database system guide reading, ’s translation terminology used different places:","code":""},{"path":"https://tidyr.tidyverse.org/dev/index.html","id":"getting-help","dir":"","previous_headings":"","what":"Getting help","title":"Tidy Messy Data","text":"encounter clear bug, please file minimal reproducible example github. questions discussion, please use forum.posit.co. Please note tidyr project released Contributor Code Conduct. contributing project, agree abide terms.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/billboard.html","id":null,"dir":"Reference","previous_headings":"","what":"Song rankings for Billboard top 100 in the year 2000 — billboard","title":"Song rankings for Billboard top 100 in the year 2000 — billboard","text":"Song rankings Billboard top 100 year 2000","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/billboard.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Song rankings for Billboard top 100 in the year 2000 — billboard","text":"","code":"billboard"},{"path":"https://tidyr.tidyverse.org/dev/reference/billboard.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Song rankings for Billboard top 100 in the year 2000 — billboard","text":"dataset variables: artist Artist name track Song name date.enter Date song entered top 100 wk1 – wk76 Rank song week entered","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/billboard.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Song rankings for Billboard top 100 in the year 2000 — billboard","text":"\"Whitburn\" project, https://waxy.org/2008/05/the_whitburn_project/, (downloaded April 2008)","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/check_pivot_spec.html","id":null,"dir":"Reference","previous_headings":"","what":"Check assumptions about a pivot spec — check_pivot_spec","title":"Check assumptions about a pivot spec — check_pivot_spec","text":"check_pivot_spec() developer facing helper function validating pivot spec used pivot_longer_spec() pivot_wider_spec(). useful extending pivot_longer() pivot_wider() new S3 methods. check_pivot_spec() makes following assertions: spec must data frame. spec must character column named .name. spec must character column named .value. .name column must unique. .name .value columns must first two columns data frame, reordered true.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/check_pivot_spec.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check assumptions about a pivot spec — check_pivot_spec","text":"","code":"check_pivot_spec(spec, call = caller_env())"},{"path":"https://tidyr.tidyverse.org/dev/reference/check_pivot_spec.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check assumptions about a pivot spec — check_pivot_spec","text":"spec specification data frame. useful complex pivots gives greater control metadata stored columns become column names result. Must data frame containing character .name .value columns. Additional columns spec named match columns long format dataset contain values corresponding columns pivoted wide format. special .seq variable used disambiguate rows internally; automatically removed pivoting.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/check_pivot_spec.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Check assumptions about a pivot spec — check_pivot_spec","text":"","code":"# A valid spec spec <- tibble(.name = \"a\", .value = \"b\", foo = 1) check_pivot_spec(spec) #> # A tibble: 1 × 3 #> .name .value foo #> #> 1 a b 1 spec <- tibble(.name = \"a\") try(check_pivot_spec(spec)) #> Error in eval(expr, envir, enclos) : #> `spec` must have `.name` and `.value` columns. # `.name` and `.value` are forced to be the first two columns spec <- tibble(foo = 1, .value = \"b\", .name = \"a\") check_pivot_spec(spec) #> # A tibble: 1 × 3 #> .name .value foo #> #> 1 a b 1"},{"path":"https://tidyr.tidyverse.org/dev/reference/chop.html","id":null,"dir":"Reference","previous_headings":"","what":"Chop and unchop — chop","title":"Chop and unchop — chop","text":"Chopping unchopping preserve width data frame, changing length. chop() makes df shorter converting rows within group list-columns. unchop() makes df longer expanding list-columns element list-column gets row output. chop() unchop() building blocks complicated functions (like unnest(), unnest_longer(), unnest_wider()) generally suitable programming interactive data analysis.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/chop.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Chop and unchop — chop","text":"","code":"chop(data, cols, ..., error_call = current_env()) unchop( data, cols, ..., keep_empty = FALSE, ptype = NULL, error_call = current_env() )"},{"path":"https://tidyr.tidyverse.org/dev/reference/chop.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Chop and unchop — chop","text":"data data frame. cols Columns chop unchop. unchop(), column list-column containing generalised vectors (e.g. mix NULLs, atomic vector, S3 vectors, lists, data frames). ... dots future extensions must empty. error_call execution environment currently running function, e.g. caller_env(). function mentioned error messages source error. See call argument abort() information. keep_empty default, get one row output element list unchopping/unnesting. means size-0 element (like NULL empty data frame vector), entire row dropped output. want preserve rows, use keep_empty = TRUE replace size-0 elements single row missing values. ptype Optionally, named list column name-prototype pairs coerce cols , overriding default guessed combining individual values. Alternatively, single empty ptype can supplied, applied cols.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/chop.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Chop and unchop — chop","text":"Generally, unchopping useful chopping simplifies complex data structure, nest()ing usually appropriate chop()ing since better preserves connections observations. chop() creates list-columns class vctrs::list_of() ensure consistent behaviour chopped data frame emptied. instance helps getting back original column types roundtrip chop unchop. keeps tracks type elements, unchop() able reconstitute correct vector type even empty list-columns.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/chop.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Chop and unchop — chop","text":"","code":"# Chop ---------------------------------------------------------------------- df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1) # Note that we get one row of output for each unique combination of # non-chopped variables df %>% chop(c(y, z)) #> # A tibble: 3 × 3 #> x y z #> > > #> 1 1 [3] [3] #> 2 2 [2] [2] #> 3 3 [1] [1] # cf nest df %>% nest(data = c(y, z)) #> # A tibble: 3 × 2 #> x data #> #> 1 1 #> 2 2 #> 3 3 # Unchop -------------------------------------------------------------------- df <- tibble(x = 1:4, y = list(integer(), 1L, 1:2, 1:3)) df %>% unchop(y) #> # A tibble: 6 × 2 #> x y #> #> 1 2 1 #> 2 3 1 #> 3 3 2 #> 4 4 1 #> 5 4 2 #> 6 4 3 df %>% unchop(y, keep_empty = TRUE) #> # A tibble: 7 × 2 #> x y #> #> 1 1 NA #> 2 2 1 #> 3 3 1 #> 4 3 2 #> 5 4 1 #> 6 4 2 #> 7 4 3 # unchop will error if the types are not compatible: df <- tibble(x = 1:2, y = list(\"1\", 1:3)) try(df %>% unchop(y)) #> Error in list_unchop(col, ptype = col_ptype) : #> Can't combine `x[[1]]` and `x[[2]]` . # Unchopping a list-col of data frames must generate a df-col because # unchop leaves the column names unchanged df <- tibble(x = 1:3, y = list(NULL, tibble(x = 1), tibble(y = 1:2))) df %>% unchop(y) #> # A tibble: 3 × 2 #> x y$x $y #> #> 1 2 1 NA #> 2 3 NA 1 #> 3 3 NA 2 df %>% unchop(y, keep_empty = TRUE) #> # A tibble: 4 × 2 #> x y$x $y #> #> 1 1 NA NA #> 2 2 1 NA #> 3 3 NA 1 #> 4 3 NA 2"},{"path":"https://tidyr.tidyverse.org/dev/reference/cms_patient_experience.html","id":null,"dir":"Reference","previous_headings":"","what":"Data from the Centers for Medicare & Medicaid Services — cms_patient_experience","title":"Data from the Centers for Medicare & Medicaid Services — cms_patient_experience","text":"Two datasets public data provided Centers Medicare & Medicaid Services, https://data.cms.gov. cms_patient_experience contains lightly cleaned data \"Hospice - Provider Data\", provides list hospice agencies along data quality patient care, https://data.cms.gov/provider-data/dataset/252m-zfp9. cms_patient_care \"Doctors Clinicians Quality Payment Program PY 2020 Virtual Group Public Reporting\", https://data.cms.gov/provider-data/dataset/8c70-d353","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/cms_patient_experience.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Data from the Centers for Medicare & Medicaid Services — cms_patient_experience","text":"","code":"cms_patient_experience cms_patient_care"},{"path":"https://tidyr.tidyverse.org/dev/reference/cms_patient_experience.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Data from the Centers for Medicare & Medicaid Services — cms_patient_experience","text":"cms_patient_experience data frame 500 observations five variables: org_pac_id,org_nm Organisation ID name measure_cd,measure_title Measure code title prf_rate Measure performance rate cms_patient_care data frame 252 observations five variables: ccn,facility_name Facility ID name measure_abbr Abbreviated measurement title, suitable use variable name score Measure score type Whether score refers rating 100 (\"observed\"), maximum possible value raw score (\"denominator\")","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/cms_patient_experience.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Data from the Centers for Medicare & Medicaid Services — cms_patient_experience","text":"","code":"cms_patient_experience %>% dplyr::distinct(measure_cd, measure_title) #> # A tibble: 6 × 2 #> measure_cd measure_title #> #> 1 CAHPS_GRP_1 CAHPS for MIPS SSM: Getting Timely Care, Appointments, and… #> 2 CAHPS_GRP_2 CAHPS for MIPS SSM: How Well Providers Communicate #> 3 CAHPS_GRP_3 CAHPS for MIPS SSM: Patient's Rating of Provider #> 4 CAHPS_GRP_5 CAHPS for MIPS SSM: Health Promotion and Education #> 5 CAHPS_GRP_8 CAHPS for MIPS SSM: Courteous and Helpful Office Staff #> 6 CAHPS_GRP_12 CAHPS for MIPS SSM: Stewardship of Patient Resources cms_patient_experience %>% pivot_wider( id_cols = starts_with(\"org\"), names_from = measure_cd, values_from = prf_rate ) #> # A tibble: 95 × 8 #> org_pac_id org_nm CAHPS_GRP_1 CAHPS_GRP_2 CAHPS_GRP_3 CAHPS_GRP_5 #> #> 1 0446157747 USC CARE ME… 63 87 86 57 #> 2 0446162697 ASSOCIATION… 59 85 83 63 #> 3 0547164295 BEAVER MEDI… 49 NA 75 44 #> 4 0749333730 CAPE PHYSIC… 67 84 85 65 #> 5 0840104360 ALLIANCE PH… 66 87 87 64 #> 6 0840109864 REX HOSPITA… 73 87 84 67 #> 7 0840513552 SCL HEALTH … 58 83 76 58 #> 8 0941545784 GRITMAN MED… 46 86 81 54 #> 9 1052612785 COMMUNITY M… 65 84 80 58 #> 10 1254237779 OUR LADY OF… 61 NA NA 65 #> # ℹ 85 more rows #> # ℹ 2 more variables: CAHPS_GRP_8 , CAHPS_GRP_12 cms_patient_care %>% pivot_wider( names_from = type, values_from = score ) #> # A tibble: 126 × 5 #> ccn facility_name measure_abbr denominator observed #> #> 1 011500 BAPTIST HOSPICE beliefs_add… 202 100 #> 2 011500 BAPTIST HOSPICE composite_p… 202 88.1 #> 3 011500 BAPTIST HOSPICE dyspena_tre… 110 99.1 #> 4 011500 BAPTIST HOSPICE dyspnea_scr… 202 100 #> 5 011500 BAPTIST HOSPICE opioid_bowel 61 100 #> 6 011500 BAPTIST HOSPICE pain_assess… 107 100 #> 7 011500 BAPTIST HOSPICE pain_screen… 202 88.6 #> 8 011500 BAPTIST HOSPICE treat_pref 202 100 #> 9 011500 BAPTIST HOSPICE visits_immi… 232 96.1 #> 10 011501 SOUTHERNCARE NEW BEACON N. BI… beliefs_add… 525 100 #> # ℹ 116 more rows cms_patient_care %>% pivot_wider( names_from = measure_abbr, values_from = score ) #> # A tibble: 28 × 12 #> ccn facility_name type beliefs_addressed composite_process #> #> 1 011500 BAPTIST HOSPICE deno… 202 202 #> 2 011500 BAPTIST HOSPICE obse… 100 88.1 #> 3 011501 SOUTHERNCARE NEW BEAC… deno… 525 525 #> 4 011501 SOUTHERNCARE NEW BEAC… obse… 100 100 #> 5 011502 COMFORT CARE COASTAL … deno… 295 295 #> 6 011502 COMFORT CARE COASTAL … obse… 100 99.3 #> 7 011503 SAAD HOSPICE SERVICES deno… 694 694 #> 8 011503 SAAD HOSPICE SERVICES obse… 99.9 96 #> 9 011505 HOSPICE FAMILY CARE deno… 600 600 #> 10 011505 HOSPICE FAMILY CARE obse… 97.8 92 #> # ℹ 18 more rows #> # ℹ 7 more variables: dyspena_treatment , dyspnea_screening , #> # opioid_bowel , pain_assessment , pain_screening , #> # treat_pref , visits_imminent cms_patient_care %>% pivot_wider( names_from = c(measure_abbr, type), values_from = score ) #> # A tibble: 14 × 20 #> ccn facility_name beliefs_addressed_de…¹ beliefs_addressed_ob…² #> #> 1 011500 BAPTIST HOSPICE 202 100 #> 2 011501 SOUTHERNCARE NEW … 525 100 #> 3 011502 COMFORT CARE COAS… 295 100 #> 4 011503 SAAD HOSPICE SERV… 694 99.9 #> 5 011505 HOSPICE FAMILY CA… 600 97.8 #> 6 011506 SOUTHERNCARE NEW … 589 100 #> 7 011508 SOUTHERNCARE NEW … 420 100 #> 8 011510 CULLMAN REGIONAL … 54 100 #> 9 011511 HOSPICE OF THE VA… 179 100 #> 10 011512 SOUTHERNCARE NEW … 396 100 #> 11 011513 SHEPHERD'S COVE H… 335 99.1 #> 12 011514 ST VINCENT'S HOSP… 210 100 #> 13 011516 HOSPICE OF LIMEST… 103 100 #> 14 011517 HOSPICE OF WEST A… 400 99.8 #> # ℹ abbreviated names: ¹​beliefs_addressed_denominator, #> # ²​beliefs_addressed_observed #> # ℹ 16 more variables: composite_process_denominator , #> # composite_process_observed , #> # dyspena_treatment_denominator , #> # dyspena_treatment_observed , #> # dyspnea_screening_denominator , …"},{"path":"https://tidyr.tidyverse.org/dev/reference/complete.html","id":null,"dir":"Reference","previous_headings":"","what":"Complete a data frame with missing combinations of data — complete","title":"Complete a data frame with missing combinations of data — complete","text":"Turns implicit missing values explicit missing values. wrapper around expand(), dplyr::full_join() replace_na() useful completing missing combinations data.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/complete.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Complete a data frame with missing combinations of data — complete","text":"","code":"complete(data, ..., fill = list(), explicit = TRUE)"},{"path":"https://tidyr.tidyverse.org/dev/reference/complete.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Complete a data frame with missing combinations of data — complete","text":"data data frame. ... Specification columns expand complete. Columns can atomic vectors lists. find unique combinations x, y z, including present data, supply variable separate argument: expand(df, x, y, z) complete(df, x, y, z). find combinations occur data, use nesting: expand(df, nesting(x, y, z)). can combine two forms. example, expand(df, nesting(school_id, student_id), date) produce row present school-student combination possible dates. used factors, expand() complete() use full set levels, just appear data. want use values seen data, use forcats::fct_drop(). used continuous variables, may need fill values appear data: use expressions like year = 2010:2020 year = full_seq(year,1). fill named list variable supplies single value use instead NA missing combinations. explicit implicit (newly created) explicit (pre-existing) missing values filled fill? default, TRUE, set FALSE limit fill implicit missing values.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/complete.html","id":"grouped-data-frames","dir":"Reference","previous_headings":"","what":"Grouped data frames","title":"Complete a data frame with missing combinations of data — complete","text":"grouped data frames created dplyr::group_by(), complete() operates within group. , complete grouping column.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/complete.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Complete a data frame with missing combinations of data — complete","text":"","code":"df <- tibble( group = c(1:2, 1, 2), item_id = c(1:2, 2, 3), item_name = c(\"a\", \"a\", \"b\", \"b\"), value1 = c(1, NA, 3, 4), value2 = 4:7 ) df #> # A tibble: 4 × 5 #> group item_id item_name value1 value2 #> #> 1 1 1 a 1 4 #> 2 2 2 a NA 5 #> 3 1 2 b 3 6 #> 4 2 3 b 4 7 # Combinations -------------------------------------------------------------- # Generate all possible combinations of `group`, `item_id`, and `item_name` # (whether or not they appear in the data) df %>% complete(group, item_id, item_name) #> # A tibble: 12 × 5 #> group item_id item_name value1 value2 #> #> 1 1 1 a 1 4 #> 2 1 1 b NA NA #> 3 1 2 a NA NA #> 4 1 2 b 3 6 #> 5 1 3 a NA NA #> 6 1 3 b NA NA #> 7 2 1 a NA NA #> 8 2 1 b NA NA #> 9 2 2 a NA 5 #> 10 2 2 b NA NA #> 11 2 3 a NA NA #> 12 2 3 b 4 7 # Cross all possible `group` values with the unique pairs of # `(item_id, item_name)` that already exist in the data df %>% complete(group, nesting(item_id, item_name)) #> # A tibble: 8 × 5 #> group item_id item_name value1 value2 #> #> 1 1 1 a 1 4 #> 2 1 2 a NA NA #> 3 1 2 b 3 6 #> 4 1 3 b NA NA #> 5 2 1 a NA NA #> 6 2 2 a NA 5 #> 7 2 2 b NA NA #> 8 2 3 b 4 7 # Within each `group`, generate all possible combinations of # `item_id` and `item_name` that occur in that group df %>% dplyr::group_by(group) %>% complete(item_id, item_name) #> # A tibble: 8 × 5 #> # Groups: group [2] #> group item_id item_name value1 value2 #> #> 1 1 1 a 1 4 #> 2 1 1 b NA NA #> 3 1 2 a NA NA #> 4 1 2 b 3 6 #> 5 2 2 a NA 5 #> 6 2 2 b NA NA #> 7 2 3 a NA NA #> 8 2 3 b 4 7 # Supplying values for new rows --------------------------------------------- # Use `fill` to replace NAs with some value. By default, affects both new # (implicit) and pre-existing (explicit) missing values. df %>% complete( group, nesting(item_id, item_name), fill = list(value1 = 0, value2 = 99) ) #> # A tibble: 8 × 5 #> group item_id item_name value1 value2 #> #> 1 1 1 a 1 4 #> 2 1 2 a 0 99 #> 3 1 2 b 3 6 #> 4 1 3 b 0 99 #> 5 2 1 a 0 99 #> 6 2 2 a 0 5 #> 7 2 2 b 0 99 #> 8 2 3 b 4 7 # Limit the fill to only the newly created (i.e. previously implicit) # missing values with `explicit = FALSE` df %>% complete( group, nesting(item_id, item_name), fill = list(value1 = 0, value2 = 99), explicit = FALSE ) #> # A tibble: 8 × 5 #> group item_id item_name value1 value2 #> #> 1 1 1 a 1 4 #> 2 1 2 a 0 99 #> 3 1 2 b 3 6 #> 4 1 3 b 0 99 #> 5 2 1 a 0 99 #> 6 2 2 a NA 5 #> 7 2 2 b 0 99 #> 8 2 3 b 4 7"},{"path":"https://tidyr.tidyverse.org/dev/reference/construction.html","id":null,"dir":"Reference","previous_headings":"","what":"Completed construction in the US in 2018 — construction","title":"Completed construction in the US in 2018 — construction","text":"Completed construction US 2018","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/construction.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Completed construction in the US in 2018 — construction","text":"","code":"construction"},{"path":"https://tidyr.tidyverse.org/dev/reference/construction.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Completed construction in the US in 2018 — construction","text":"dataset variables: Year,Month Record date 1 unit, 2 4 units, 5 units mote Number completed units size Northeast,Midwest,South,West Number completed units region","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/construction.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Completed construction in the US in 2018 — construction","text":"Completions \"New Residential Construction\" found Table 5 https://www.census.gov/construction/nrc/xls/newresconst.xls (downloaded March 2019)","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/deprecated-se.html","id":null,"dir":"Reference","previous_headings":"","what":"Deprecated SE versions of main verbs — deprecated-se","title":"Deprecated SE versions of main verbs — deprecated-se","text":"tidyr used offer twin versions verb suffixed underscore. versions standard evaluation (SE) semantics: rather taking arguments code, like NSE verbs, took arguments value. purpose make possible program tidyr. However, tidyr now uses tidy evaluation semantics. NSE verbs still capture arguments, can now unquote parts arguments. offers full programmability NSE verbs. Thus, underscored versions now superfluous. Unquoting triggers immediate evaluation operand inlines result within captured expression. result can value expression evaluated later rest argument. See vignette(\"programming\", \"dplyr\") information.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/deprecated-se.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Deprecated SE versions of main verbs — deprecated-se","text":"","code":"complete_(data, cols, fill = list(), ...) drop_na_(data, vars) expand_(data, dots, ...) crossing_(x) nesting_(x) extract_( data, col, into, regex = \"([[:alnum:]]+)\", remove = TRUE, convert = FALSE, ... ) fill_(data, fill_cols, .direction = c(\"down\", \"up\")) gather_( data, key_col, value_col, gather_cols, na.rm = FALSE, convert = FALSE, factor_key = FALSE ) nest_(...) separate_rows_(data, cols, sep = \"[^[:alnum:].]+\", convert = FALSE) separate_( data, col, into, sep = \"[^[:alnum:]]+\", remove = TRUE, convert = FALSE, extra = \"warn\", fill = \"warn\", ... ) spread_( data, key_col, value_col, fill = NA, convert = FALSE, drop = TRUE, sep = NULL ) unite_(data, col, from, sep = \"_\", remove = TRUE) unnest_(...)"},{"path":"https://tidyr.tidyverse.org/dev/reference/deprecated-se.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Deprecated SE versions of main verbs — deprecated-se","text":"data data frame fill named list variable supplies single value use instead NA missing combinations. ... Specification columns expand complete. Columns can atomic vectors lists. find unique combinations x, y z, including present data, supply variable separate argument: expand(df, x, y, z) complete(df, x, y, z). find combinations occur data, use nesting: expand(df, nesting(x, y, z)). can combine two forms. example, expand(df, nesting(school_id, student_id), date) produce row present school-student combination possible dates. used factors, expand() complete() use full set levels, just appear data. want use values seen data, use forcats::fct_drop(). used continuous variables, may need fill values appear data: use expressions like year = 2010:2020 year = full_seq(year,1). vars, cols, col Name columns. x nesting_ crossing_ list variables. Names new variables create character vector. Use NA omit variable output. regex string representing regular expression used extract desired values. one group (defined ()) element . remove TRUE, remove input column output data frame. convert TRUE, run type.convert() .= TRUE new columns. useful component columns integer, numeric logical. NB: cause string \"NA\"s converted NAs. fill_cols Character vector column names. .direction Direction fill missing values. Currently either \"\" (default), \"\", \"downup\" (.e. first ) \"updown\" (first ). key_col, value_col Strings giving names key value cols. gather_cols Character vector giving column names gathered pair key-value columns. na.rm TRUE, remove rows output value column NA. factor_key FALSE, default, key values stored character vector. TRUE, stored factor, preserves original ordering columns. sep Separator delimiting collapsed values. extra sep character vector, controls happens many pieces. three valid options: \"warn\" (default): emit warning drop extra values. \"drop\": drop extra values without warning. \"merge\": splits length() times drop FALSE, keep factor levels appear data, filling missing combinations fill. Names existing columns character vector","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/drop_na.html","id":null,"dir":"Reference","previous_headings":"","what":"Drop rows containing missing values — drop_na","title":"Drop rows containing missing values — drop_na","text":"drop_na() drops rows column specified ... contains missing value.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/drop_na.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Drop rows containing missing values — drop_na","text":"","code":"drop_na(data, ...)"},{"path":"https://tidyr.tidyverse.org/dev/reference/drop_na.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Drop rows containing missing values — drop_na","text":"data data frame. ... Columns inspect missing values. empty, columns used.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/drop_na.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Drop rows containing missing values — drop_na","text":"Another way interpret drop_na() keeps \"complete\" rows (rows contain missing values). Internally, completeness computed vctrs::vec_detect_complete().","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/drop_na.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Drop rows containing missing values — drop_na","text":"","code":"df <- tibble(x = c(1, 2, NA), y = c(\"a\", NA, \"b\")) df %>% drop_na() #> # A tibble: 1 × 2 #> x y #> #> 1 1 a df %>% drop_na(x) #> # A tibble: 2 × 2 #> x y #> #> 1 1 a #> 2 2 NA vars <- \"y\" df %>% drop_na(x, any_of(vars)) #> # A tibble: 1 × 2 #> x y #> #> 1 1 a"},{"path":"https://tidyr.tidyverse.org/dev/reference/expand.html","id":null,"dir":"Reference","previous_headings":"","what":"Expand data frame to include all possible combinations of values — expand","title":"Expand data frame to include all possible combinations of values — expand","text":"expand() generates combination variables found dataset. paired nesting() crossing() helpers. crossing() wrapper around expand_grid() de-duplicates sorts inputs; nesting() helper finds combinations already present data. expand() often useful conjunction joins: use right_join() convert implicit missing values explicit missing values (e.g., fill gaps data frame). use anti_join() figure combinations missing (e.g., identify gaps data frame).","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/expand.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Expand data frame to include all possible combinations of values — expand","text":"","code":"expand(data, ..., .name_repair = \"check_unique\") crossing(..., .name_repair = \"check_unique\") nesting(..., .name_repair = \"check_unique\")"},{"path":"https://tidyr.tidyverse.org/dev/reference/expand.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Expand data frame to include all possible combinations of values — expand","text":"data data frame. ... Specification columns expand complete. Columns can atomic vectors lists. find unique combinations x, y z, including present data, supply variable separate argument: expand(df, x, y, z) complete(df, x, y, z). find combinations occur data, use nesting: expand(df, nesting(x, y, z)). can combine two forms. example, expand(df, nesting(school_id, student_id), date) produce row present school-student combination possible dates. used factors, expand() complete() use full set levels, just appear data. want use values seen data, use forcats::fct_drop(). used continuous variables, may need fill values appear data: use expressions like year = 2010:2020 year = full_seq(year,1). .name_repair Treatment problematic column names: \"minimal\": name repair checks, beyond basic existence, \"unique\": Make sure names unique empty, \"check_unique\": (default value), name repair, check unique, \"universal\": Make names unique syntactic function: apply custom name repair (e.g., .name_repair = make.names names style base R). purrr-style anonymous function, see rlang::as_function() argument passed repair vctrs::vec_as_names(). See details terms strategies used enforce .","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/expand.html","id":"grouped-data-frames","dir":"Reference","previous_headings":"","what":"Grouped data frames","title":"Expand data frame to include all possible combinations of values — expand","text":"grouped data frames created dplyr::group_by(), expand() operates within group. , expand grouping column.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/expand.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Expand data frame to include all possible combinations of values — expand","text":"","code":"# Finding combinations ------------------------------------------------------ fruits <- tibble( type = c(\"apple\", \"orange\", \"apple\", \"orange\", \"orange\", \"orange\"), year = c(2010, 2010, 2012, 2010, 2011, 2012), size = factor( c(\"XS\", \"S\", \"M\", \"S\", \"S\", \"M\"), levels = c(\"XS\", \"S\", \"M\", \"L\") ), weights = rnorm(6, as.numeric(size) + 2) ) # All combinations, including factor levels that are not used fruits %>% expand(type) #> # A tibble: 2 × 1 #> type #> #> 1 apple #> 2 orange fruits %>% expand(size) #> # A tibble: 4 × 1 #> size #> #> 1 XS #> 2 S #> 3 M #> 4 L fruits %>% expand(type, size) #> # A tibble: 8 × 2 #> type size #> #> 1 apple XS #> 2 apple S #> 3 apple M #> 4 apple L #> 5 orange XS #> 6 orange S #> 7 orange M #> 8 orange L fruits %>% expand(type, size, year) #> # A tibble: 24 × 3 #> type size year #> #> 1 apple XS 2010 #> 2 apple XS 2011 #> 3 apple XS 2012 #> 4 apple S 2010 #> 5 apple S 2011 #> 6 apple S 2012 #> 7 apple M 2010 #> 8 apple M 2011 #> 9 apple M 2012 #> 10 apple L 2010 #> # ℹ 14 more rows # Only combinations that already appear in the data fruits %>% expand(nesting(type)) #> # A tibble: 2 × 1 #> type #> #> 1 apple #> 2 orange fruits %>% expand(nesting(size)) #> # A tibble: 3 × 1 #> size #> #> 1 XS #> 2 S #> 3 M fruits %>% expand(nesting(type, size)) #> # A tibble: 4 × 2 #> type size #> #> 1 apple XS #> 2 apple M #> 3 orange S #> 4 orange M fruits %>% expand(nesting(type, size, year)) #> # A tibble: 5 × 3 #> type size year #> #> 1 apple XS 2010 #> 2 apple M 2012 #> 3 orange S 2010 #> 4 orange S 2011 #> 5 orange M 2012 # Other uses ---------------------------------------------------------------- # Use with `full_seq()` to fill in values of continuous variables fruits %>% expand(type, size, full_seq(year, 1)) #> # A tibble: 24 × 3 #> type size `full_seq(year, 1)` #> #> 1 apple XS 2010 #> 2 apple XS 2011 #> 3 apple XS 2012 #> 4 apple S 2010 #> 5 apple S 2011 #> 6 apple S 2012 #> 7 apple M 2010 #> 8 apple M 2011 #> 9 apple M 2012 #> 10 apple L 2010 #> # ℹ 14 more rows fruits %>% expand(type, size, 2010:2013) #> # A tibble: 32 × 3 #> type size `2010:2013` #> #> 1 apple XS 2010 #> 2 apple XS 2011 #> 3 apple XS 2012 #> 4 apple XS 2013 #> 5 apple S 2010 #> 6 apple S 2011 #> 7 apple S 2012 #> 8 apple S 2013 #> 9 apple M 2010 #> 10 apple M 2011 #> # ℹ 22 more rows # Use `anti_join()` to determine which observations are missing all <- fruits %>% expand(type, size, year) all #> # A tibble: 24 × 3 #> type size year #> #> 1 apple XS 2010 #> 2 apple XS 2011 #> 3 apple XS 2012 #> 4 apple S 2010 #> 5 apple S 2011 #> 6 apple S 2012 #> 7 apple M 2010 #> 8 apple M 2011 #> 9 apple M 2012 #> 10 apple L 2010 #> # ℹ 14 more rows all %>% dplyr::anti_join(fruits) #> Joining with `by = join_by(type, size, year)` #> # A tibble: 19 × 3 #> type size year #> #> 1 apple XS 2011 #> 2 apple XS 2012 #> 3 apple S 2010 #> 4 apple S 2011 #> 5 apple S 2012 #> 6 apple M 2010 #> 7 apple M 2011 #> 8 apple L 2010 #> 9 apple L 2011 #> 10 apple L 2012 #> 11 orange XS 2010 #> 12 orange XS 2011 #> 13 orange XS 2012 #> 14 orange S 2012 #> 15 orange M 2010 #> 16 orange M 2011 #> 17 orange L 2010 #> 18 orange L 2011 #> 19 orange L 2012 # Use with `right_join()` to fill in missing rows (like `complete()`) fruits %>% dplyr::right_join(all) #> Joining with `by = join_by(type, year, size)` #> # A tibble: 25 × 4 #> type year size weights #> #> 1 apple 2010 XS 1.60 #> 2 orange 2010 S 4.26 #> 3 apple 2012 M 2.56 #> 4 orange 2010 S 3.99 #> 5 orange 2011 S 4.62 #> 6 orange 2012 M 6.15 #> 7 apple 2011 XS NA #> 8 apple 2012 XS NA #> 9 apple 2010 S NA #> 10 apple 2011 S NA #> # ℹ 15 more rows # Use with `group_by()` to expand within each group fruits %>% dplyr::group_by(type) %>% expand(year, size) #> # A tibble: 20 × 3 #> # Groups: type [2] #> type year size #> #> 1 apple 2010 XS #> 2 apple 2010 S #> 3 apple 2010 M #> 4 apple 2010 L #> 5 apple 2012 XS #> 6 apple 2012 S #> 7 apple 2012 M #> 8 apple 2012 L #> 9 orange 2010 XS #> 10 orange 2010 S #> 11 orange 2010 M #> 12 orange 2010 L #> 13 orange 2011 XS #> 14 orange 2011 S #> 15 orange 2011 M #> 16 orange 2011 L #> 17 orange 2012 XS #> 18 orange 2012 S #> 19 orange 2012 M #> 20 orange 2012 L"},{"path":"https://tidyr.tidyverse.org/dev/reference/expand_grid.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a tibble from all combinations of inputs — expand_grid","title":"Create a tibble from all combinations of inputs — expand_grid","text":"expand_grid() heavily motivated expand.grid(). Compared expand.grid(), : Produces sorted output (varying first column slowest, rather fastest). Returns tibble, data frame. Never converts strings factors. add additional attributes. Can expand generalised vector, including data frames.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/expand_grid.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a tibble from all combinations of inputs — expand_grid","text":"","code":"expand_grid(..., .name_repair = \"check_unique\")"},{"path":"https://tidyr.tidyverse.org/dev/reference/expand_grid.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a tibble from all combinations of inputs — expand_grid","text":"... Name-value pairs. name become column name output. .name_repair Treatment problematic column names: \"minimal\": name repair checks, beyond basic existence, \"unique\": Make sure names unique empty, \"check_unique\": (default value), name repair, check unique, \"universal\": Make names unique syntactic function: apply custom name repair (e.g., .name_repair = make.names names style base R). purrr-style anonymous function, see rlang::as_function() argument passed repair vctrs::vec_as_names(). See details terms strategies used enforce .","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/expand_grid.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a tibble from all combinations of inputs — expand_grid","text":"tibble one column input .... output one row combination inputs, .e. size equal product sizes inputs. implies input length 0, output zero rows.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/expand_grid.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create a tibble from all combinations of inputs — expand_grid","text":"","code":"expand_grid(x = 1:3, y = 1:2) #> # A tibble: 6 × 2 #> x y #> #> 1 1 1 #> 2 1 2 #> 3 2 1 #> 4 2 2 #> 5 3 1 #> 6 3 2 expand_grid(l1 = letters, l2 = LETTERS) #> # A tibble: 676 × 2 #> l1 l2 #> #> 1 a A #> 2 a B #> 3 a C #> 4 a D #> 5 a E #> 6 a F #> 7 a G #> 8 a H #> 9 a I #> 10 a J #> # ℹ 666 more rows # Can also expand data frames expand_grid(df = tibble(x = 1:2, y = c(2, 1)), z = 1:3) #> # A tibble: 6 × 2 #> df$x $y z #> #> 1 1 2 1 #> 2 1 2 2 #> 3 1 2 3 #> 4 2 1 1 #> 5 2 1 2 #> 6 2 1 3 # And matrices expand_grid(x1 = matrix(1:4, nrow = 2), x2 = matrix(5:8, nrow = 2)) #> # A tibble: 4 × 2 #> x1[,1] [,2] x2[,1] [,2] #> #> 1 1 3 5 7 #> 2 1 3 6 8 #> 3 2 4 5 7 #> 4 2 4 6 8"},{"path":"https://tidyr.tidyverse.org/dev/reference/extract.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract a character column into multiple columns using regular expression groups — extract","title":"Extract a character column into multiple columns using regular expression groups — extract","text":"extract() superseded favour separate_wider_regex() polished API better handling problems. Superseded functions go away, receive critical bug fixes. Given regular expression capturing groups, extract() turns group new column. groups match, input NA, output NA.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/extract.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract a character column into multiple columns using regular expression groups — extract","text":"","code":"extract( data, col, into, regex = \"([[:alnum:]]+)\", remove = TRUE, convert = FALSE, ... )"},{"path":"https://tidyr.tidyverse.org/dev/reference/extract.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract a character column into multiple columns using regular expression groups — extract","text":"data data frame. col Column expand. Names new variables create character vector. Use NA omit variable output. regex string representing regular expression used extract desired values. one group (defined ()) element . remove TRUE, remove input column output data frame. convert TRUE, run type.convert() .= TRUE new columns. useful component columns integer, numeric logical. NB: cause string \"NA\"s converted NAs. ... Additional arguments passed methods.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/extract.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Extract a character column into multiple columns using regular expression groups — extract","text":"","code":"df <- tibble(x = c(NA, \"a-b\", \"a-d\", \"b-c\", \"d-e\")) df %>% extract(x, \"A\") #> # A tibble: 5 × 1 #> A #> #> 1 NA #> 2 a #> 3 a #> 4 b #> 5 d df %>% extract(x, c(\"A\", \"B\"), \"([[:alnum:]]+)-([[:alnum:]]+)\") #> # A tibble: 5 × 2 #> A B #> #> 1 NA NA #> 2 a b #> 3 a d #> 4 b c #> 5 d e # Now recommended df %>% separate_wider_regex( x, patterns = c(A = \"[[:alnum:]]+\", \"-\", B = \"[[:alnum:]]+\") ) #> # A tibble: 5 × 2 #> A B #> #> 1 NA NA #> 2 a b #> 3 a d #> 4 b c #> 5 d e # If no match, NA: df %>% extract(x, c(\"A\", \"B\"), \"([a-d]+)-([a-d]+)\") #> # A tibble: 5 × 2 #> A B #> #> 1 NA NA #> 2 a b #> 3 a d #> 4 b c #> 5 NA NA"},{"path":"https://tidyr.tidyverse.org/dev/reference/extract_numeric.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract numeric component of variable. — extract_numeric","title":"Extract numeric component of variable. — extract_numeric","text":"DEPRECATED: please use readr::parse_number() instead.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/extract_numeric.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract numeric component of variable. — extract_numeric","text":"","code":"extract_numeric(x)"},{"path":"https://tidyr.tidyverse.org/dev/reference/extract_numeric.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract numeric component of variable. — extract_numeric","text":"x character vector (factor).","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/fill.html","id":null,"dir":"Reference","previous_headings":"","what":"Fill in missing values with previous or next value — fill","title":"Fill in missing values with previous or next value — fill","text":"Fills missing values selected columns using next previous entry. useful common output format values repeated, recorded change.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/fill.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fill in missing values with previous or next value — fill","text":"","code":"fill(data, ..., .direction = c(\"down\", \"up\", \"downup\", \"updown\"))"},{"path":"https://tidyr.tidyverse.org/dev/reference/fill.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fill in missing values with previous or next value — fill","text":"data data frame. ... Columns fill. .direction Direction fill missing values. Currently either \"\" (default), \"\", \"downup\" (.e. first ) \"updown\" (first ).","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/fill.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Fill in missing values with previous or next value — fill","text":"Missing values replaced atomic vectors; NULLs replaced lists.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/fill.html","id":"grouped-data-frames","dir":"Reference","previous_headings":"","what":"Grouped data frames","title":"Fill in missing values with previous or next value — fill","text":"grouped data frames created dplyr::group_by(), fill() applied within group, meaning fill across group boundaries.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/fill.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Fill in missing values with previous or next value — fill","text":"","code":"# direction = \"down\" -------------------------------------------------------- # Value (year) is recorded only when it changes sales <- tibble::tribble( ~quarter, ~year, ~sales, \"Q1\", 2000, 66013, \"Q2\", NA, 69182, \"Q3\", NA, 53175, \"Q4\", NA, 21001, \"Q1\", 2001, 46036, \"Q2\", NA, 58842, \"Q3\", NA, 44568, \"Q4\", NA, 50197, \"Q1\", 2002, 39113, \"Q2\", NA, 41668, \"Q3\", NA, 30144, \"Q4\", NA, 52897, \"Q1\", 2004, 32129, \"Q2\", NA, 67686, \"Q3\", NA, 31768, \"Q4\", NA, 49094 ) # `fill()` defaults to replacing missing data from top to bottom sales %>% fill(year) #> # A tibble: 16 × 3 #> quarter year sales #> #> 1 Q1 2000 66013 #> 2 Q2 2000 69182 #> 3 Q3 2000 53175 #> 4 Q4 2000 21001 #> 5 Q1 2001 46036 #> 6 Q2 2001 58842 #> 7 Q3 2001 44568 #> 8 Q4 2001 50197 #> 9 Q1 2002 39113 #> 10 Q2 2002 41668 #> 11 Q3 2002 30144 #> 12 Q4 2002 52897 #> 13 Q1 2004 32129 #> 14 Q2 2004 67686 #> 15 Q3 2004 31768 #> 16 Q4 2004 49094 # direction = \"up\" ---------------------------------------------------------- # Value (pet_type) is missing above tidy_pets <- tibble::tribble( ~rank, ~pet_type, ~breed, 1L, NA, \"Boston Terrier\", 2L, NA, \"Retrievers (Labrador)\", 3L, NA, \"Retrievers (Golden)\", 4L, NA, \"French Bulldogs\", 5L, NA, \"Bulldogs\", 6L, \"Dog\", \"Beagles\", 1L, NA, \"Persian\", 2L, NA, \"Maine Coon\", 3L, NA, \"Ragdoll\", 4L, NA, \"Exotic\", 5L, NA, \"Siamese\", 6L, \"Cat\", \"American Short\" ) # For values that are missing above you can use `.direction = \"up\"` tidy_pets %>% fill(pet_type, .direction = \"up\") #> # A tibble: 12 × 3 #> rank pet_type breed #> #> 1 1 Dog Boston Terrier #> 2 2 Dog Retrievers (Labrador) #> 3 3 Dog Retrievers (Golden) #> 4 4 Dog French Bulldogs #> 5 5 Dog Bulldogs #> 6 6 Dog Beagles #> 7 1 Cat Persian #> 8 2 Cat Maine Coon #> 9 3 Cat Ragdoll #> 10 4 Cat Exotic #> 11 5 Cat Siamese #> 12 6 Cat American Short # direction = \"downup\" ------------------------------------------------------ # Value (n_squirrels) is missing above and below within a group squirrels <- tibble::tribble( ~group, ~name, ~role, ~n_squirrels, 1, \"Sam\", \"Observer\", NA, 1, \"Mara\", \"Scorekeeper\", 8, 1, \"Jesse\", \"Observer\", NA, 1, \"Tom\", \"Observer\", NA, 2, \"Mike\", \"Observer\", NA, 2, \"Rachael\", \"Observer\", NA, 2, \"Sydekea\", \"Scorekeeper\", 14, 2, \"Gabriela\", \"Observer\", NA, 3, \"Derrick\", \"Observer\", NA, 3, \"Kara\", \"Scorekeeper\", 9, 3, \"Emily\", \"Observer\", NA, 3, \"Danielle\", \"Observer\", NA ) # The values are inconsistently missing by position within the group # Use .direction = \"downup\" to fill missing values in both directions squirrels %>% dplyr::group_by(group) %>% fill(n_squirrels, .direction = \"downup\") %>% dplyr::ungroup() #> # A tibble: 12 × 4 #> group name role n_squirrels #> #> 1 1 Sam Observer 8 #> 2 1 Mara Scorekeeper 8 #> 3 1 Jesse Observer 8 #> 4 1 Tom Observer 8 #> 5 2 Mike Observer 14 #> 6 2 Rachael Observer 14 #> 7 2 Sydekea Scorekeeper 14 #> 8 2 Gabriela Observer 14 #> 9 3 Derrick Observer 9 #> 10 3 Kara Scorekeeper 9 #> 11 3 Emily Observer 9 #> 12 3 Danielle Observer 9 # Using `.direction = \"updown\"` accomplishes the same goal in this example"},{"path":"https://tidyr.tidyverse.org/dev/reference/fish_encounters.html","id":null,"dir":"Reference","previous_headings":"","what":"Fish encounters — fish_encounters","title":"Fish encounters — fish_encounters","text":"Information fish swimming river: station represents autonomous monitor records tagged fish seen location. Fish travel one direction (migrating downstream). Information misses just important hits, directly recorded form data.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/fish_encounters.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fish encounters — fish_encounters","text":"","code":"fish_encounters"},{"path":"https://tidyr.tidyverse.org/dev/reference/fish_encounters.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Fish encounters — fish_encounters","text":"dataset variables: fish Fish identifier station Measurement station seen fish seen? (1 yes, true rows)","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/fish_encounters.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Fish encounters — fish_encounters","text":"Dataset provided Myfanwy Johnston; details https://fishsciences.github.io/post/visualizing-fish-encounter-histories/","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/full_seq.html","id":null,"dir":"Reference","previous_headings":"","what":"Create the full sequence of values in a vector — full_seq","title":"Create the full sequence of values in a vector — full_seq","text":"useful want fill missing values observed . example, full_seq(c(1, 2, 4, 6), 1) return 1:6.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/full_seq.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create the full sequence of values in a vector — full_seq","text":"","code":"full_seq(x, period, tol = 1e-06)"},{"path":"https://tidyr.tidyverse.org/dev/reference/full_seq.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create the full sequence of values in a vector — full_seq","text":"x numeric vector. period Gap observation. existing data checked ensure actually periodicity. tol Numerical tolerance checking periodicity.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/full_seq.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create the full sequence of values in a vector — full_seq","text":"","code":"full_seq(c(1, 2, 4, 5, 10), 1) #> [1] 1 2 3 4 5 6 7 8 9 10"},{"path":"https://tidyr.tidyverse.org/dev/reference/gather.html","id":null,"dir":"Reference","previous_headings":"","what":"Gather columns into key-value pairs — gather","title":"Gather columns into key-value pairs — gather","text":"Development gather() complete, new code recommend switching pivot_longer(), easier use, featureful, still active development. df %>% gather(\"key\", \"value\", x, y, z) equivalent df %>% pivot_longer(c(x, y, z), names_to = \"key\", values_to = \"value\") See details vignette(\"pivot\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/gather.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Gather columns into key-value pairs — gather","text":"","code":"gather( data, key = \"key\", value = \"value\", ..., na.rm = FALSE, convert = FALSE, factor_key = FALSE )"},{"path":"https://tidyr.tidyverse.org/dev/reference/gather.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Gather columns into key-value pairs — gather","text":"data data frame. key, value Names new key value columns, strings symbols. argument passed expression supports quasiquotation (can unquote strings symbols). name captured expression rlang::ensym() (note kind interface symbols represent actual objects now discouraged tidyverse; support backward compatibility). ... selection columns. empty, variables selected. can supply bare variable names, select variables x z x:z, exclude y -y. options, see dplyr::select() documentation. See also section selection rules . na.rm TRUE, remove rows output value column NA. convert TRUE automatically run type.convert() key column. useful column types actually numeric, integer, logical. factor_key FALSE, default, key values stored character vector. TRUE, stored factor, preserves original ordering columns.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/gather.html","id":"rules-for-selection","dir":"Reference","previous_headings":"","what":"Rules for selection","title":"Gather columns into key-value pairs — gather","text":"Arguments selecting columns passed tidyselect::vars_select() treated specially. Unlike verbs, selecting functions make strict distinction data expressions context expressions. data expression either bare name like x expression like x:y c(x, y). data expression, can refer columns data frame. Everything else context expression can refer objects defined <-. instance, col1:col3 data expression refers data columns, seq(start, end) context expression refers objects contexts. need refer contextual objects data expression, can use all_of() any_of(). functions used select data-variables whose names stored env-variable. instance, all_of() selects variables listed character vector . details, see tidyselect::select_helpers() documentation.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/gather.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Gather columns into key-value pairs — gather","text":"","code":"# From https://stackoverflow.com/questions/1181060 stocks <- tibble( time = as.Date(\"2009-01-01\") + 0:9, X = rnorm(10, 0, 1), Y = rnorm(10, 0, 2), Z = rnorm(10, 0, 4) ) gather(stocks, \"stock\", \"price\", -time) #> # A tibble: 30 × 3 #> time stock price #> #> 1 2009-01-01 X -1.82 #> 2 2009-01-02 X -0.247 #> 3 2009-01-03 X -0.244 #> 4 2009-01-04 X -0.283 #> 5 2009-01-05 X -0.554 #> 6 2009-01-06 X 0.629 #> 7 2009-01-07 X 2.07 #> 8 2009-01-08 X -1.63 #> 9 2009-01-09 X 0.512 #> 10 2009-01-10 X -1.86 #> # ℹ 20 more rows stocks %>% gather(\"stock\", \"price\", -time) #> # A tibble: 30 × 3 #> time stock price #> #> 1 2009-01-01 X -1.82 #> 2 2009-01-02 X -0.247 #> 3 2009-01-03 X -0.244 #> 4 2009-01-04 X -0.283 #> 5 2009-01-05 X -0.554 #> 6 2009-01-06 X 0.629 #> 7 2009-01-07 X 2.07 #> 8 2009-01-08 X -1.63 #> 9 2009-01-09 X 0.512 #> 10 2009-01-10 X -1.86 #> # ℹ 20 more rows # get first observation for each Species in iris data -- base R mini_iris <- iris[c(1, 51, 101), ] # gather Sepal.Length, Sepal.Width, Petal.Length, Petal.Width gather(mini_iris, key = \"flower_att\", value = \"measurement\", Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) #> Species flower_att measurement #> 1 setosa Sepal.Length 5.1 #> 2 versicolor Sepal.Length 7.0 #> 3 virginica Sepal.Length 6.3 #> 4 setosa Sepal.Width 3.5 #> 5 versicolor Sepal.Width 3.2 #> 6 virginica Sepal.Width 3.3 #> 7 setosa Petal.Length 1.4 #> 8 versicolor Petal.Length 4.7 #> 9 virginica Petal.Length 6.0 #> 10 setosa Petal.Width 0.2 #> 11 versicolor Petal.Width 1.4 #> 12 virginica Petal.Width 2.5 # same result but less verbose gather(mini_iris, key = \"flower_att\", value = \"measurement\", -Species) #> Species flower_att measurement #> 1 setosa Sepal.Length 5.1 #> 2 versicolor Sepal.Length 7.0 #> 3 virginica Sepal.Length 6.3 #> 4 setosa Sepal.Width 3.5 #> 5 versicolor Sepal.Width 3.2 #> 6 virginica Sepal.Width 3.3 #> 7 setosa Petal.Length 1.4 #> 8 versicolor Petal.Length 4.7 #> 9 virginica Petal.Length 6.0 #> 10 setosa Petal.Width 0.2 #> 11 versicolor Petal.Width 1.4 #> 12 virginica Petal.Width 2.5"},{"path":"https://tidyr.tidyverse.org/dev/reference/hoist.html","id":null,"dir":"Reference","previous_headings":"","what":"Hoist values out of list-columns — hoist","title":"Hoist values out of list-columns — hoist","text":"hoist() allows selectively pull components list-column top-level columns, using syntax purrr::pluck(). Learn vignette(\"rectangle\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/hoist.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Hoist values out of list-columns — hoist","text":"","code":"hoist( .data, .col, ..., .remove = TRUE, .simplify = TRUE, .ptype = NULL, .transform = NULL )"},{"path":"https://tidyr.tidyverse.org/dev/reference/hoist.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Hoist values out of list-columns — hoist","text":".data data frame. .col List-column extract components . ... Components .col turn columns form col_name = \"pluck_specification\". can pluck name character vector, position integer vector, combination two list. See purrr::pluck() details. column names must unique call hoist(), although existing columns name overwritten. plucking single string can choose omit name, .e. hoist(df, col, \"x\") short-hand hoist(df, col, x = \"x\"). .remove TRUE, default, remove extracted components .col. ensures value lives one place. components removed .col, .col removed result entirely. .simplify TRUE, attempt simplify lists length-1 vectors atomic vector. Can also named list containing TRUE FALSE declaring whether attempt simplify particular column. named list provided, default unspecified columns TRUE. .ptype Optionally, named list prototypes declaring desired output type component. Alternatively, single empty prototype can supplied, applied components. Use argument want check element type expect simplifying. ptype specified, simplify = FALSE simplification possible, list-column returned element type ptype. .transform Optionally, named list transformation functions applied component. Alternatively, single function can supplied, applied components. Use argument want transform parse individual elements extracted. ptype transform supplied, transform applied ptype.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/hoist.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Hoist values out of list-columns — hoist","text":"","code":"df <- tibble( character = c(\"Toothless\", \"Dory\"), metadata = list( list( species = \"dragon\", color = \"black\", films = c( \"How to Train Your Dragon\", \"How to Train Your Dragon 2\", \"How to Train Your Dragon: The Hidden World\" ) ), list( species = \"blue tang\", color = \"blue\", films = c(\"Finding Nemo\", \"Finding Dory\") ) ) ) df #> # A tibble: 2 × 2 #> character metadata #> #> 1 Toothless #> 2 Dory # Extract only specified components df %>% hoist(metadata, \"species\", first_film = list(\"films\", 1L), third_film = list(\"films\", 3L) ) #> # A tibble: 2 × 5 #> character species first_film third_film metadata #> #> 1 Toothless dragon How to Train Your Dragon How to Train … #> 2 Dory blue tang Finding Nemo NA "},{"path":"https://tidyr.tidyverse.org/dev/reference/household.html","id":null,"dir":"Reference","previous_headings":"","what":"Household data — household","title":"Household data — household","text":"dataset based example vignette(\"datatable-reshape\", package = \"data.table\")","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/household.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Household data — household","text":"","code":"household"},{"path":"https://tidyr.tidyverse.org/dev/reference/household.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Household data — household","text":"data frame 5 rows 5 columns: family Family identifier dob_child1 Date birth first child dob_child2 Date birth second child name_child1 Name first child name_child2 Name second child","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/nest.html","id":null,"dir":"Reference","previous_headings":"","what":"Nest rows into a list-column of data frames — nest","title":"Nest rows into a list-column of data frames — nest","text":"Nesting creates list-column data frames; unnesting flattens back regular columns. Nesting implicitly summarising operation: get one row group defined non-nested columns. useful conjunction summaries work whole datasets, notably models. Learn vignette(\"nest\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/nest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Nest rows into a list-column of data frames — nest","text":"","code":"nest(.data, ..., .by = NULL, .key = NULL, .names_sep = NULL)"},{"path":"https://tidyr.tidyverse.org/dev/reference/nest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Nest rows into a list-column of data frames — nest","text":".data data frame. ... Columns nest; appear inner data frames. Specified using name-variable pairs form new_col = c(col1, col2, col3). right hand side can valid tidyselect expression. supplied, ... derived columns selected ., use column name .key. : previously write df %>% nest(x, y, z). Convert df %>% nest(data = c(x, y, z)). . Columns nest ; remain outer data frame. .can used place conjunction columns supplied .... supplied, .derived columns selected .... .key name resulting nested column. applicable ... specified, .e. case df %>% nest(.= x). NULL, \"data\" used default. .names_sep NULL, default, inner names come former outer names. string, new inner names use outer names names_sep automatically stripped. makes names_sep roughly symmetric nesting unnesting.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/nest.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Nest rows into a list-column of data frames — nest","text":"neither ... .supplied, nest() nest variables, use column name supplied .key.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/nest.html","id":"new-syntax","dir":"Reference","previous_headings":"","what":"New syntax","title":"Nest rows into a list-column of data frames — nest","text":"tidyr 1.0.0 introduced new syntax nest() unnest() designed similar functions. Converting new syntax straightforward (guided message receive) just need run old analysis, can easily revert previous behaviour using nest_legacy() unnest_legacy() follows:","code":"library(tidyr) nest <- nest_legacy unnest <- unnest_legacy"},{"path":"https://tidyr.tidyverse.org/dev/reference/nest.html","id":"grouped-data-frames","dir":"Reference","previous_headings":"","what":"Grouped data frames","title":"Nest rows into a list-column of data frames — nest","text":"df %>% nest(data = c(x, y)) specifies columns nested; .e. columns appear inner data frame. df %>% nest(.= c(x, y)) specifies columns nest ; .e. columns remain outer data frame. alternative way achieve latter nest() grouped data frame created dplyr::group_by(). grouping variables remain outer data frame others nested. result preserves grouping input. Variables supplied nest() override grouping variables df %>% group_by(x, y) %>% nest(data = !z) equivalent df %>% nest(data = !z). supply .grouped data frame, groups already represent nesting .","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/nest.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Nest rows into a list-column of data frames — nest","text":"","code":"df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1) # Specify variables to nest using name-variable pairs. # Note that we get one row of output for each unique combination of # non-nested variables. df %>% nest(data = c(y, z)) #> # A tibble: 3 × 2 #> x data #> #> 1 1 #> 2 2 #> 3 3 # Specify variables to nest by (rather than variables to nest) using `.by` df %>% nest(.by = x) #> # A tibble: 3 × 2 #> x data #> #> 1 1 #> 2 2 #> 3 3 # In this case, since `...` isn't used you can specify the resulting column # name with `.key` df %>% nest(.by = x, .key = \"cols\") #> # A tibble: 3 × 2 #> x cols #> #> 1 1 #> 2 2 #> 3 3 # Use tidyselect syntax and helpers, just like in `dplyr::select()` df %>% nest(data = any_of(c(\"y\", \"z\"))) #> # A tibble: 3 × 2 #> x data #> #> 1 1 #> 2 2 #> 3 3 # `...` and `.by` can be used together to drop columns you no longer need, # or to include the columns you are nesting by in the inner data frame too. # This drops `z`: df %>% nest(data = y, .by = x) #> # A tibble: 3 × 2 #> x data #> #> 1 1 #> 2 2 #> 3 3 # This includes `x` in the inner data frame: df %>% nest(data = everything(), .by = x) #> # A tibble: 3 × 2 #> x data #> #> 1 1 #> 2 2 #> 3 3 # Multiple nesting structures can be specified at once iris %>% nest(petal = starts_with(\"Petal\"), sepal = starts_with(\"Sepal\")) #> # A tibble: 3 × 3 #> Species petal sepal #> #> 1 setosa #> 2 versicolor #> 3 virginica iris %>% nest(width = contains(\"Width\"), length = contains(\"Length\")) #> # A tibble: 3 × 3 #> Species width length #> #> 1 setosa #> 2 versicolor #> 3 virginica # Nesting a grouped data frame nests all variables apart from the group vars fish_encounters %>% dplyr::group_by(fish) %>% nest() #> # A tibble: 19 × 2 #> # Groups: fish [19] #> fish data #> #> 1 4842 #> 2 4843 #> 3 4844 #> 4 4845 #> 5 4847 #> 6 4848 #> 7 4849 #> 8 4850 #> 9 4851 #> 10 4854 #> 11 4855 #> 12 4857 #> 13 4858 #> 14 4859 #> 15 4861 #> 16 4862 #> 17 4863 #> 18 4864 #> 19 4865 # That is similar to `nest(.by = )`, except here the result isn't grouped fish_encounters %>% nest(.by = fish) #> # A tibble: 19 × 2 #> fish data #> #> 1 4842 #> 2 4843 #> 3 4844 #> 4 4845 #> 5 4847 #> 6 4848 #> 7 4849 #> 8 4850 #> 9 4851 #> 10 4854 #> 11 4855 #> 12 4857 #> 13 4858 #> 14 4859 #> 15 4861 #> 16 4862 #> 17 4863 #> 18 4864 #> 19 4865 # Nesting is often useful for creating per group models mtcars %>% nest(.by = cyl) %>% dplyr::mutate(models = lapply(data, function(df) lm(mpg ~ wt, data = df))) #> # A tibble: 3 × 3 #> cyl data models #> #> 1 6 #> 2 4 #> 3 8 "},{"path":"https://tidyr.tidyverse.org/dev/reference/nest_legacy.html","id":null,"dir":"Reference","previous_headings":"","what":"Legacy versions of nest() and unnest() — nest_legacy","title":"Legacy versions of nest() and unnest() — nest_legacy","text":"tidyr 1.0.0 introduced new syntax nest() unnest(). majority existing usage automatically translated new syntax warning. However, need quickly roll back previous behaviour, functions provide previous interface. make old code work , add following code top script:","code":"library(tidyr) nest <- nest_legacy unnest <- unnest_legacy"},{"path":"https://tidyr.tidyverse.org/dev/reference/nest_legacy.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Legacy versions of nest() and unnest() — nest_legacy","text":"","code":"nest_legacy(data, ..., .key = \"data\") unnest_legacy(data, ..., .drop = NA, .id = NULL, .sep = NULL, .preserve = NULL)"},{"path":"https://tidyr.tidyverse.org/dev/reference/nest_legacy.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Legacy versions of nest() and unnest() — nest_legacy","text":"data data frame. ... Specification columns unnest. Use bare variable names functions variables. omitted, defaults list-cols. .key name new column, string symbol. argument passed expression supports quasiquotation (can unquote strings symbols). name captured expression rlang::ensym() (note kind interface symbols represent actual objects now discouraged tidyverse; support backward compatibility). .drop additional list columns dropped? default, unnest() drop unnesting specified columns requires rows duplicated. .id Data frame identifier - supplied, create new column name .id, giving unique identifier. useful list column named. .sep non-NULL, names unnested data frame columns combine name original list-col names nested data frame, separated .sep. .preserve Optionally, list-columns preserve output. duplicated way atomic vectors. dplyr::select() semantics can preserve multiple variables .preserve = c(x, y) .preserve = starts_with(\"list\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/nest_legacy.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Legacy versions of nest() and unnest() — nest_legacy","text":"","code":"# Nest and unnest are inverses df <- tibble(x = c(1, 1, 2), y = 3:1) df %>% nest_legacy(y) #> # A tibble: 2 × 2 #> x data #> #> 1 1 #> 2 2 df %>% nest_legacy(y) %>% unnest_legacy() #> # A tibble: 3 × 2 #> x y #> #> 1 1 3 #> 2 1 2 #> 3 2 1 # nesting ------------------------------------------------------------------- as_tibble(iris) %>% nest_legacy(!Species) #> # A tibble: 3 × 2 #> Species data #> #> 1 setosa #> 2 versicolor #> 3 virginica as_tibble(chickwts) %>% nest_legacy(weight) #> # A tibble: 6 × 2 #> feed data #> #> 1 horsebean #> 2 linseed #> 3 soybean #> 4 sunflower #> 5 meatmeal #> 6 casein # unnesting ----------------------------------------------------------------- df <- tibble( x = 1:2, y = list( tibble(z = 1), tibble(z = 3:4) ) ) df %>% unnest_legacy(y) #> # A tibble: 3 × 2 #> x z #> #> 1 1 1 #> 2 2 3 #> 3 2 4 # You can also unnest multiple columns simultaneously df <- tibble( a = list(c(\"a\", \"b\"), \"c\"), b = list(1:2, 3), c = c(11, 22) ) df %>% unnest_legacy(a, b) #> # A tibble: 3 × 3 #> c a b #> #> 1 11 a 1 #> 2 11 b 2 #> 3 22 c 3 # If you omit the column names, it'll unnest all list-cols df %>% unnest_legacy() #> # A tibble: 3 × 3 #> c a b #> #> 1 11 a 1 #> 2 11 b 2 #> 3 22 c 3"},{"path":"https://tidyr.tidyverse.org/dev/reference/pack.html","id":null,"dir":"Reference","previous_headings":"","what":"Pack and unpack — pack","title":"Pack and unpack — pack","text":"Packing unpacking preserve length data frame, changing width. pack() makes df narrow collapsing set columns single df-column. unpack() makes data wider expanding df-columns back individual columns.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pack.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pack and unpack — pack","text":"","code":"pack(.data, ..., .names_sep = NULL, .error_call = current_env()) unpack( data, cols, ..., names_sep = NULL, names_repair = \"check_unique\", error_call = current_env() )"},{"path":"https://tidyr.tidyverse.org/dev/reference/pack.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pack and unpack — pack","text":"... pack(), columns pack, specified using name-variable pairs form new_col = c(col1, col2, col3). right hand side can valid tidy select expression. unpack(), dots future extensions must empty. data, .data data frame. cols Columns unpack. names_sep, .names_sep NULL, default, names left . pack(), inner names come former outer names; unpack(), new outer names come inner names. string, inner outer names used together. unpack(), names new outer columns formed pasting together outer inner column names, separated names_sep. pack(), new inner names outer names + names_sep automatically stripped. makes names_sep roughly symmetric packing unpacking. names_repair Used check output data frame valid names. Must one following options: \"minimal\": name repair checks, beyond basic existence, \"unique\": make sure names unique empty, \"check_unique\": (default), name repair, check unique, \"universal\": make names unique syntactic function: apply custom name repair. tidyr_legacy: use name repair tidyr 0.8. formula: purrr-style anonymous function (see rlang::as_function()) See vctrs::vec_as_names() details terms strategies used enforce . error_call, .error_call execution environment currently running function, e.g. caller_env(). function mentioned error messages source error. See call argument abort() information.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pack.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Pack and unpack — pack","text":"Generally, unpacking useful packing simplifies complex data structure. Currently, functions work df-cols, mostly curiosity, seem worth exploring mimic nested column headers popular Excel.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pack.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Pack and unpack — pack","text":"","code":"# Packing ------------------------------------------------------------------- # It's not currently clear why you would ever want to pack columns # since few functions work with this sort of data. df <- tibble(x1 = 1:3, x2 = 4:6, x3 = 7:9, y = 1:3) df #> # A tibble: 3 × 4 #> x1 x2 x3 y #> #> 1 1 4 7 1 #> 2 2 5 8 2 #> 3 3 6 9 3 df %>% pack(x = starts_with(\"x\")) #> # A tibble: 3 × 2 #> y x$x1 $x2 $x3 #> #> 1 1 1 4 7 #> 2 2 2 5 8 #> 3 3 3 6 9 df %>% pack(x = c(x1, x2, x3), y = y) #> # A tibble: 3 × 2 #> x$x1 $x2 $x3 y$y #> #> 1 1 4 7 1 #> 2 2 5 8 2 #> 3 3 6 9 3 # .names_sep allows you to strip off common prefixes; this # acts as a natural inverse to name_sep in unpack() iris %>% as_tibble() %>% pack( Sepal = starts_with(\"Sepal\"), Petal = starts_with(\"Petal\"), .names_sep = \".\" ) #> # A tibble: 150 × 3 #> Species Sepal$Length $Width Petal$Length $Width #> #> 1 setosa 5.1 3.5 1.4 0.2 #> 2 setosa 4.9 3 1.4 0.2 #> 3 setosa 4.7 3.2 1.3 0.2 #> 4 setosa 4.6 3.1 1.5 0.2 #> 5 setosa 5 3.6 1.4 0.2 #> 6 setosa 5.4 3.9 1.7 0.4 #> 7 setosa 4.6 3.4 1.4 0.3 #> 8 setosa 5 3.4 1.5 0.2 #> 9 setosa 4.4 2.9 1.4 0.2 #> 10 setosa 4.9 3.1 1.5 0.1 #> # ℹ 140 more rows # Unpacking ----------------------------------------------------------------- df <- tibble( x = 1:3, y = tibble(a = 1:3, b = 3:1), z = tibble(X = c(\"a\", \"b\", \"c\"), Y = runif(3), Z = c(TRUE, FALSE, NA)) ) df #> # A tibble: 3 × 3 #> x y$a $b z$X $Y $Z #> #> 1 1 1 3 a 0.0281 TRUE #> 2 2 2 2 b 0.466 FALSE #> 3 3 3 1 c 0.390 NA df %>% unpack(y) #> # A tibble: 3 × 4 #> x a b z$X $Y $Z #> #> 1 1 1 3 a 0.0281 TRUE #> 2 2 2 2 b 0.466 FALSE #> 3 3 3 1 c 0.390 NA df %>% unpack(c(y, z)) #> # A tibble: 3 × 6 #> x a b X Y Z #> #> 1 1 1 3 a 0.0281 TRUE #> 2 2 2 2 b 0.466 FALSE #> 3 3 3 1 c 0.390 NA df %>% unpack(c(y, z), names_sep = \"_\") #> # A tibble: 3 × 6 #> x y_a y_b z_X z_Y z_Z #> #> 1 1 1 3 a 0.0281 TRUE #> 2 2 2 2 b 0.466 FALSE #> 3 3 3 1 c 0.390 NA"},{"path":"https://tidyr.tidyverse.org/dev/reference/pipe.html","id":null,"dir":"Reference","previous_headings":"","what":"Pipe operator — %>%","title":"Pipe operator — %>%","text":"See %>% details.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pipe.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pipe operator — %>%","text":"","code":"lhs %>% rhs"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer.html","id":null,"dir":"Reference","previous_headings":"","what":"Pivot data from wide to long — pivot_longer","title":"Pivot data from wide to long — pivot_longer","text":"pivot_longer() \"lengthens\" data, increasing number rows decreasing number columns. inverse transformation pivot_wider() Learn vignette(\"pivot\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pivot data from wide to long — pivot_longer","text":"","code":"pivot_longer( data, cols, ..., cols_vary = \"fastest\", names_to = \"name\", names_prefix = NULL, names_sep = NULL, names_pattern = NULL, names_ptypes = NULL, names_transform = NULL, names_repair = \"check_unique\", values_to = \"value\", values_drop_na = FALSE, values_ptypes = NULL, values_transform = NULL )"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pivot data from wide to long — pivot_longer","text":"data data frame pivot. cols Columns pivot longer format. ... Additional arguments passed methods. cols_vary pivoting cols longer format, output rows arranged relative original row number? \"fastest\", default, keeps individual rows cols close together output. often produces intuitively ordered output least one key column data involved pivoting process. \"slowest\" keeps individual columns cols close together output. often produces intuitively ordered output utilize columns data pivoting process. names_to character vector specifying new column columns create information stored column names data specified cols. length 0, NULL supplied, columns created. length 1, single column created contain column names specified cols. length >1, multiple columns created. case, one names_sep names_pattern must supplied specify column names split. also two additional character values can take advantage : NA discard corresponding component column name. \".value\" indicates corresponding component column name defines name output column containing cell values, overriding values_to entirely. names_prefix regular expression used remove matching text start variable name. names_sep, names_pattern names_to contains multiple values, arguments control column name broken . names_sep takes specification separate(), can either numeric vector (specifying positions break ), single string (specifying regular expression split ). names_pattern takes specification extract(), regular expression containing matching groups (()). arguments give enough control, use pivot_longer_spec() create spec object process manually needed. names_ptypes, values_ptypes Optionally, list column name-prototype pairs. Alternatively, single empty prototype can supplied, applied columns. prototype (ptype short) zero-length vector (like integer() numeric()) defines type, class, attributes vector. Use arguments want confirm created columns types expect. Note want change (instead confirm) types specific columns, use names_transform values_transform instead. names_transform, values_transform Optionally, list column name-function pairs. Alternatively, single function can supplied, applied columns. Use arguments need change types specific columns. example, names_transform = list(week = .integer) convert character variable called week integer. specified, type columns generated names_to character, type variables generated values_to common type input columns used generate . names_repair happens output invalid column names? default, \"check_unique\" error columns duplicated. Use \"minimal\" allow duplicates output, \"unique\" de-duplicated adding numeric suffixes. See vctrs::vec_as_names() options. values_to string specifying name column create data stored cell values. names_to character containing special .value sentinel, value ignored, name value column derived part existing column names. values_drop_na TRUE, drop rows contain NAs value_to column. effectively converts explicit missing values implicit missing values, generally used missing values data created structure.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Pivot data from wide to long — pivot_longer","text":"pivot_longer() updated approach gather(), designed simpler use handle use cases. recommend use pivot_longer() new code; gather() going away longer active development.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Pivot data from wide to long — pivot_longer","text":"","code":"# See vignette(\"pivot\") for examples and explanation # Simplest case where column names are character data relig_income #> # A tibble: 18 × 11 #> religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` #> #> 1 Agnostic 27 34 60 81 76 137 #> 2 Atheist 12 27 37 52 35 70 #> 3 Buddhist 27 21 30 34 33 58 #> 4 Catholic 418 617 732 670 638 1116 #> 5 Don’t know/r… 15 14 15 11 10 35 #> 6 Evangelical … 575 869 1064 982 881 1486 #> 7 Hindu 1 9 7 9 11 34 #> 8 Historically… 228 244 236 238 197 223 #> 9 Jehovah's Wi… 20 27 24 24 21 30 #> 10 Jewish 19 19 25 25 30 95 #> 11 Mainline Prot 289 495 619 655 651 1107 #> 12 Mormon 29 40 48 51 56 112 #> 13 Muslim 6 7 9 10 9 23 #> 14 Orthodox 13 17 23 32 32 47 #> 15 Other Christ… 9 7 11 13 13 14 #> 16 Other Faiths 20 33 40 46 49 63 #> 17 Other World … 5 2 3 4 2 7 #> 18 Unaffiliated 217 299 374 365 341 528 #> # ℹ 4 more variables: `$75-100k` , `$100-150k` , `>150k` , #> # `Don't know/refused` relig_income %>% pivot_longer(!religion, names_to = \"income\", values_to = \"count\") #> # A tibble: 180 × 3 #> religion income count #> #> 1 Agnostic <$10k 27 #> 2 Agnostic $10-20k 34 #> 3 Agnostic $20-30k 60 #> 4 Agnostic $30-40k 81 #> 5 Agnostic $40-50k 76 #> 6 Agnostic $50-75k 137 #> 7 Agnostic $75-100k 122 #> 8 Agnostic $100-150k 109 #> 9 Agnostic >150k 84 #> 10 Agnostic Don't know/refused 96 #> # ℹ 170 more rows # Slightly more complex case where columns have common prefix, # and missing missings are structural so should be dropped. billboard #> # A tibble: 317 × 79 #> artist track date.entered wk1 wk2 wk3 wk4 wk5 wk6 wk7 #> #> 1 2 Pac Baby… 2000-02-26 87 82 72 77 87 94 99 #> 2 2Ge+her The … 2000-09-02 91 87 92 NA NA NA NA #> 3 3 Doors D… Kryp… 2000-04-08 81 70 68 67 66 57 54 #> 4 3 Doors D… Loser 2000-10-21 76 76 72 69 67 65 55 #> 5 504 Boyz Wobb… 2000-04-15 57 34 25 17 17 31 36 #> 6 98^0 Give… 2000-08-19 51 39 34 26 26 19 2 #> 7 A*Teens Danc… 2000-07-08 97 97 96 95 100 NA NA #> 8 Aaliyah I Do… 2000-01-29 84 62 51 41 38 35 35 #> 9 Aaliyah Try … 2000-03-18 59 53 38 28 21 18 16 #> 10 Adams, Yo… Open… 2000-08-26 76 76 74 69 68 67 61 #> # ℹ 307 more rows #> # ℹ 69 more variables: wk8 , wk9 , wk10 , wk11 , #> # wk12 , wk13 , wk14 , wk15 , wk16 , #> # wk17 , wk18 , wk19 , wk20 , wk21 , #> # wk22 , wk23 , wk24 , wk25 , wk26 , #> # wk27 , wk28 , wk29 , wk30 , wk31 , #> # wk32 , wk33 , wk34 , wk35 , wk36 , … billboard %>% pivot_longer( cols = starts_with(\"wk\"), names_to = \"week\", names_prefix = \"wk\", values_to = \"rank\", values_drop_na = TRUE ) #> # A tibble: 5,307 × 5 #> artist track date.entered week rank #> #> 1 2 Pac Baby Don't Cry (Keep... 2000-02-26 1 87 #> 2 2 Pac Baby Don't Cry (Keep... 2000-02-26 2 82 #> 3 2 Pac Baby Don't Cry (Keep... 2000-02-26 3 72 #> 4 2 Pac Baby Don't Cry (Keep... 2000-02-26 4 77 #> 5 2 Pac Baby Don't Cry (Keep... 2000-02-26 5 87 #> 6 2 Pac Baby Don't Cry (Keep... 2000-02-26 6 94 #> 7 2 Pac Baby Don't Cry (Keep... 2000-02-26 7 99 #> 8 2Ge+her The Hardest Part Of ... 2000-09-02 1 91 #> 9 2Ge+her The Hardest Part Of ... 2000-09-02 2 87 #> 10 2Ge+her The Hardest Part Of ... 2000-09-02 3 92 #> # ℹ 5,297 more rows # Multiple variables stored in column names who %>% pivot_longer( cols = new_sp_m014:newrel_f65, names_to = c(\"diagnosis\", \"gender\", \"age\"), names_pattern = \"new_?(.*)_(.)(.*)\", values_to = \"count\" ) #> # A tibble: 405,440 × 8 #> country iso2 iso3 year diagnosis gender age count #> #> 1 Afghanistan AF AFG 1980 sp m 014 NA #> 2 Afghanistan AF AFG 1980 sp m 1524 NA #> 3 Afghanistan AF AFG 1980 sp m 2534 NA #> 4 Afghanistan AF AFG 1980 sp m 3544 NA #> 5 Afghanistan AF AFG 1980 sp m 4554 NA #> 6 Afghanistan AF AFG 1980 sp m 5564 NA #> 7 Afghanistan AF AFG 1980 sp m 65 NA #> 8 Afghanistan AF AFG 1980 sp f 014 NA #> 9 Afghanistan AF AFG 1980 sp f 1524 NA #> 10 Afghanistan AF AFG 1980 sp f 2534 NA #> # ℹ 405,430 more rows # Multiple observations per row. Since all columns are used in the pivoting # process, we'll use `cols_vary` to keep values from the original columns # close together in the output. anscombe #> x1 x2 x3 x4 y1 y2 y3 y4 #> 1 10 10 10 8 8.04 9.14 7.46 6.58 #> 2 8 8 8 8 6.95 8.14 6.77 5.76 #> 3 13 13 13 8 7.58 8.74 12.74 7.71 #> 4 9 9 9 8 8.81 8.77 7.11 8.84 #> 5 11 11 11 8 8.33 9.26 7.81 8.47 #> 6 14 14 14 8 9.96 8.10 8.84 7.04 #> 7 6 6 6 8 7.24 6.13 6.08 5.25 #> 8 4 4 4 19 4.26 3.10 5.39 12.50 #> 9 12 12 12 8 10.84 9.13 8.15 5.56 #> 10 7 7 7 8 4.82 7.26 6.42 7.91 #> 11 5 5 5 8 5.68 4.74 5.73 6.89 anscombe %>% pivot_longer( everything(), cols_vary = \"slowest\", names_to = c(\".value\", \"set\"), names_pattern = \"(.)(.)\" ) #> # A tibble: 44 × 3 #> set x y #> #> 1 1 10 8.04 #> 2 1 8 6.95 #> 3 1 13 7.58 #> 4 1 9 8.81 #> 5 1 11 8.33 #> 6 1 14 9.96 #> 7 1 6 7.24 #> 8 1 4 4.26 #> 9 1 12 10.8 #> 10 1 7 4.82 #> # ℹ 34 more rows"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer_spec.html","id":null,"dir":"Reference","previous_headings":"","what":"Pivot data from wide to long using a spec — pivot_longer_spec","title":"Pivot data from wide to long using a spec — pivot_longer_spec","text":"low level interface pivoting, inspired cdata package, allows describe pivoting data frame.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer_spec.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pivot data from wide to long using a spec — pivot_longer_spec","text":"","code":"pivot_longer_spec( data, spec, ..., cols_vary = \"fastest\", names_repair = \"check_unique\", values_drop_na = FALSE, values_ptypes = NULL, values_transform = NULL, error_call = current_env() ) build_longer_spec( data, cols, ..., names_to = \"name\", values_to = \"value\", names_prefix = NULL, names_sep = NULL, names_pattern = NULL, names_ptypes = NULL, names_transform = NULL, error_call = current_env() )"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer_spec.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pivot data from wide to long using a spec — pivot_longer_spec","text":"data data frame pivot. spec specification data frame. useful complex pivots gives greater control metadata stored column names turns columns result. Must data frame containing character .name .value columns. Additional columns spec named match columns long format dataset contain values corresponding columns pivoted wide format. special .seq variable used disambiguate rows internally; automatically removed pivoting. ... dots future extensions must empty. cols_vary pivoting cols longer format, output rows arranged relative original row number? \"fastest\", default, keeps individual rows cols close together output. often produces intuitively ordered output least one key column data involved pivoting process. \"slowest\" keeps individual columns cols close together output. often produces intuitively ordered output utilize columns data pivoting process. names_repair happens output invalid column names? default, \"check_unique\" error columns duplicated. Use \"minimal\" allow duplicates output, \"unique\" de-duplicated adding numeric suffixes. See vctrs::vec_as_names() options. values_drop_na TRUE, drop rows contain NAs value_to column. effectively converts explicit missing values implicit missing values, generally used missing values data created structure. error_call execution environment currently running function, e.g. caller_env(). function mentioned error messages source error. See call argument abort() information. cols Columns pivot longer format. names_to character vector specifying new column columns create information stored column names data specified cols. length 0, NULL supplied, columns created. length 1, single column created contain column names specified cols. length >1, multiple columns created. case, one names_sep names_pattern must supplied specify column names split. also two additional character values can take advantage : NA discard corresponding component column name. \".value\" indicates corresponding component column name defines name output column containing cell values, overriding values_to entirely. values_to string specifying name column create data stored cell values. names_to character containing special .value sentinel, value ignored, name value column derived part existing column names. names_prefix regular expression used remove matching text start variable name. names_sep, names_pattern names_to contains multiple values, arguments control column name broken . names_sep takes specification separate(), can either numeric vector (specifying positions break ), single string (specifying regular expression split ). names_pattern takes specification extract(), regular expression containing matching groups (()). arguments give enough control, use pivot_longer_spec() create spec object process manually needed. names_ptypes, values_ptypes Optionally, list column name-prototype pairs. Alternatively, single empty prototype can supplied, applied columns. prototype (ptype short) zero-length vector (like integer() numeric()) defines type, class, attributes vector. Use arguments want confirm created columns types expect. Note want change (instead confirm) types specific columns, use names_transform values_transform instead. names_transform, values_transform Optionally, list column name-function pairs. Alternatively, single function can supplied, applied columns. Use arguments need change types specific columns. example, names_transform = list(week = .integer) convert character variable called week integer. specified, type columns generated names_to character, type variables generated values_to common type input columns used generate .","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer_spec.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Pivot data from wide to long using a spec — pivot_longer_spec","text":"","code":"# See vignette(\"pivot\") for examples and explanation # Use `build_longer_spec()` to build `spec` using similar syntax to `pivot_longer()` # and run `pivot_longer_spec()` based on `spec`. spec <- relig_income %>% build_longer_spec( cols = !religion, names_to = \"income\", values_to = \"count\" ) spec #> # A tibble: 10 × 3 #> .name .value income #> #> 1 <$10k count <$10k #> 2 $10-20k count $10-20k #> 3 $20-30k count $20-30k #> 4 $30-40k count $30-40k #> 5 $40-50k count $40-50k #> 6 $50-75k count $50-75k #> 7 $75-100k count $75-100k #> 8 $100-150k count $100-150k #> 9 >150k count >150k #> 10 Don't know/refused count Don't know/refused pivot_longer_spec(relig_income, spec) #> # A tibble: 180 × 3 #> religion income count #> #> 1 Agnostic <$10k 27 #> 2 Agnostic $10-20k 34 #> 3 Agnostic $20-30k 60 #> 4 Agnostic $30-40k 81 #> 5 Agnostic $40-50k 76 #> 6 Agnostic $50-75k 137 #> 7 Agnostic $75-100k 122 #> 8 Agnostic $100-150k 109 #> 9 Agnostic >150k 84 #> 10 Agnostic Don't know/refused 96 #> # ℹ 170 more rows # Is equivalent to: relig_income %>% pivot_longer( cols = !religion, names_to = \"income\", values_to = \"count\" ) #> # A tibble: 180 × 3 #> religion income count #> #> 1 Agnostic <$10k 27 #> 2 Agnostic $10-20k 34 #> 3 Agnostic $20-30k 60 #> 4 Agnostic $30-40k 81 #> 5 Agnostic $40-50k 76 #> 6 Agnostic $50-75k 137 #> 7 Agnostic $75-100k 122 #> 8 Agnostic $100-150k 109 #> 9 Agnostic >150k 84 #> 10 Agnostic Don't know/refused 96 #> # ℹ 170 more rows"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider.html","id":null,"dir":"Reference","previous_headings":"","what":"Pivot data from long to wide — pivot_wider","title":"Pivot data from long to wide — pivot_wider","text":"pivot_wider() \"widens\" data, increasing number columns decreasing number rows. inverse transformation pivot_longer(). Learn vignette(\"pivot\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pivot data from long to wide — pivot_wider","text":"","code":"pivot_wider( data, ..., id_cols = NULL, id_expand = FALSE, names_from = name, names_prefix = \"\", names_sep = \"_\", names_glue = NULL, names_sort = FALSE, names_vary = \"fastest\", names_expand = FALSE, names_repair = \"check_unique\", values_from = value, values_fill = NULL, values_fn = NULL, unused_fn = NULL )"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pivot data from long to wide — pivot_wider","text":"data data frame pivot. ... Additional arguments passed methods. id_cols set columns uniquely identify observation. Typically used redundant variables, .e. variables whose values perfectly correlated existing variables. Defaults columns data except columns specified names_from values_from. tidyselect expression supplied, evaluated data removing columns specified names_from values_from. id_expand values id_cols columns expanded expand() pivoting? results rows, output contain complete expansion possible values id_cols. Implicit factor levels represented data become explicit. Additionally, row values corresponding expanded id_cols sorted. names_from, values_from pair arguments describing column (columns) get name output column (names_from), column (columns) get cell values (values_from). values_from contains multiple values, value added front output column. names_prefix String added start every variable name. particularly useful names_from numeric vector want create syntactic variable names. names_sep names_from values_from contains multiple variables, used join values together single string use column name. names_glue Instead names_sep names_prefix, can supply glue specification uses names_from columns (special .value) create custom column names. names_sort column names sorted? FALSE, default, column names ordered first appearance. names_vary names_from identifies column (columns) multiple unique values, multiple values_from columns provided, order resulting column names combined? \"fastest\" varies names_from values fastest, resulting column naming scheme form: value1_name1, value1_name2, value2_name1, value2_name2. default. \"slowest\" varies names_from values slowest, resulting column naming scheme form: value1_name1, value2_name1, value1_name2, value2_name2. names_expand values names_from columns expanded expand() pivoting? results columns, output contain column names corresponding complete expansion possible values names_from. Implicit factor levels represented data become explicit. Additionally, column names sorted, identical names_sort produce. names_repair happens output invalid column names? default, \"check_unique\" error columns duplicated. Use \"minimal\" allow duplicates output, \"unique\" de-duplicated adding numeric suffixes. See vctrs::vec_as_names() options. values_fill Optionally, (scalar) value specifies value filled missing. can named list want apply different fill values different value columns. values_fn Optionally, function applied value cell output. typically use combination id_cols names_from columns uniquely identify observation. can named list want apply different aggregations different values_from columns. unused_fn Optionally, function applied summarize values unused columns (.e. columns identified id_cols, names_from, values_from). default drops unused columns result. can named list want apply different aggregations different unused columns. id_cols must supplied unused_fn useful, since otherwise unspecified columns considered id_cols. similar grouping id_cols summarizing unused columns using unused_fn.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Pivot data from long to wide — pivot_wider","text":"pivot_wider() updated approach spread(), designed simpler use handle use cases. recommend use pivot_wider() new code; spread() going away longer active development.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Pivot data from long to wide — pivot_wider","text":"","code":"# See vignette(\"pivot\") for examples and explanation fish_encounters #> # A tibble: 114 × 3 #> fish station seen #> #> 1 4842 Release 1 #> 2 4842 I80_1 1 #> 3 4842 Lisbon 1 #> 4 4842 Rstr 1 #> 5 4842 Base_TD 1 #> 6 4842 BCE 1 #> 7 4842 BCW 1 #> 8 4842 BCE2 1 #> 9 4842 BCW2 1 #> 10 4842 MAE 1 #> # ℹ 104 more rows fish_encounters %>% pivot_wider(names_from = station, values_from = seen) #> # A tibble: 19 × 12 #> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE #> #> 1 4842 1 1 1 1 1 1 1 1 1 1 #> 2 4843 1 1 1 1 1 1 1 1 1 1 #> 3 4844 1 1 1 1 1 1 1 1 1 1 #> 4 4845 1 1 1 1 1 NA NA NA NA NA #> 5 4847 1 1 1 NA NA NA NA NA NA NA #> 6 4848 1 1 1 1 NA NA NA NA NA NA #> 7 4849 1 1 NA NA NA NA NA NA NA NA #> 8 4850 1 1 NA 1 1 1 1 NA NA NA #> 9 4851 1 1 NA NA NA NA NA NA NA NA #> 10 4854 1 1 NA NA NA NA NA NA NA NA #> 11 4855 1 1 1 1 1 NA NA NA NA NA #> 12 4857 1 1 1 1 1 1 1 1 1 NA #> 13 4858 1 1 1 1 1 1 1 1 1 1 #> 14 4859 1 1 1 1 1 NA NA NA NA NA #> 15 4861 1 1 1 1 1 1 1 1 1 1 #> 16 4862 1 1 1 1 1 1 1 1 1 NA #> 17 4863 1 1 NA NA NA NA NA NA NA NA #> 18 4864 1 1 NA NA NA NA NA NA NA NA #> 19 4865 1 1 1 NA NA NA NA NA NA NA #> # ℹ 1 more variable: MAW # Fill in missing values fish_encounters %>% pivot_wider(names_from = station, values_from = seen, values_fill = 0) #> # A tibble: 19 × 12 #> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE #> #> 1 4842 1 1 1 1 1 1 1 1 1 1 #> 2 4843 1 1 1 1 1 1 1 1 1 1 #> 3 4844 1 1 1 1 1 1 1 1 1 1 #> 4 4845 1 1 1 1 1 0 0 0 0 0 #> 5 4847 1 1 1 0 0 0 0 0 0 0 #> 6 4848 1 1 1 1 0 0 0 0 0 0 #> 7 4849 1 1 0 0 0 0 0 0 0 0 #> 8 4850 1 1 0 1 1 1 1 0 0 0 #> 9 4851 1 1 0 0 0 0 0 0 0 0 #> 10 4854 1 1 0 0 0 0 0 0 0 0 #> 11 4855 1 1 1 1 1 0 0 0 0 0 #> 12 4857 1 1 1 1 1 1 1 1 1 0 #> 13 4858 1 1 1 1 1 1 1 1 1 1 #> 14 4859 1 1 1 1 1 0 0 0 0 0 #> 15 4861 1 1 1 1 1 1 1 1 1 1 #> 16 4862 1 1 1 1 1 1 1 1 1 0 #> 17 4863 1 1 0 0 0 0 0 0 0 0 #> 18 4864 1 1 0 0 0 0 0 0 0 0 #> 19 4865 1 1 1 0 0 0 0 0 0 0 #> # ℹ 1 more variable: MAW # Generate column names from multiple variables us_rent_income #> # A tibble: 104 × 5 #> GEOID NAME variable estimate moe #> #> 1 01 Alabama income 24476 136 #> 2 01 Alabama rent 747 3 #> 3 02 Alaska income 32940 508 #> 4 02 Alaska rent 1200 13 #> 5 04 Arizona income 27517 148 #> 6 04 Arizona rent 972 4 #> 7 05 Arkansas income 23789 165 #> 8 05 Arkansas rent 709 5 #> 9 06 California income 29454 109 #> 10 06 California rent 1358 3 #> # ℹ 94 more rows us_rent_income %>% pivot_wider( names_from = variable, values_from = c(estimate, moe) ) #> # A tibble: 52 × 6 #> GEOID NAME estimate_income estimate_rent moe_income moe_rent #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Co… 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows # You can control whether `names_from` values vary fastest or slowest # relative to the `values_from` column names using `names_vary`. us_rent_income %>% pivot_wider( names_from = variable, values_from = c(estimate, moe), names_vary = \"slowest\" ) #> # A tibble: 52 × 6 #> GEOID NAME estimate_income moe_income estimate_rent moe_rent #> #> 1 01 Alabama 24476 136 747 3 #> 2 02 Alaska 32940 508 1200 13 #> 3 04 Arizona 27517 148 972 4 #> 4 05 Arkansas 23789 165 709 5 #> 5 06 California 29454 109 1358 3 #> 6 08 Colorado 32401 109 1125 5 #> 7 09 Connecticut 35326 195 1123 5 #> 8 10 Delaware 31560 247 1076 10 #> 9 11 District of Co… 43198 681 1424 17 #> 10 12 Florida 25952 70 1077 3 #> # ℹ 42 more rows # When there are multiple `names_from` or `values_from`, you can use # use `names_sep` or `names_glue` to control the output variable names us_rent_income %>% pivot_wider( names_from = variable, names_sep = \".\", values_from = c(estimate, moe) ) #> # A tibble: 52 × 6 #> GEOID NAME estimate.income estimate.rent moe.income moe.rent #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Co… 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows us_rent_income %>% pivot_wider( names_from = variable, names_glue = \"{variable}_{.value}\", values_from = c(estimate, moe) ) #> # A tibble: 52 × 6 #> GEOID NAME income_estimate rent_estimate income_moe rent_moe #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Co… 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows # Can perform aggregation with `values_fn` warpbreaks <- as_tibble(warpbreaks[c(\"wool\", \"tension\", \"breaks\")]) warpbreaks #> # A tibble: 54 × 3 #> wool tension breaks #> #> 1 A L 26 #> 2 A L 30 #> 3 A L 54 #> 4 A L 25 #> 5 A L 70 #> 6 A L 52 #> 7 A L 51 #> 8 A L 26 #> 9 A L 67 #> 10 A M 18 #> # ℹ 44 more rows warpbreaks %>% pivot_wider( names_from = wool, values_from = breaks, values_fn = mean ) #> # A tibble: 3 × 3 #> tension A B #> #> 1 L 44.6 28.2 #> 2 M 24 28.8 #> 3 H 24.6 18.8 # Can pass an anonymous function to `values_fn` when you # need to supply additional arguments warpbreaks$breaks[1] <- NA warpbreaks %>% pivot_wider( names_from = wool, values_from = breaks, values_fn = ~ mean(.x, na.rm = TRUE) ) #> # A tibble: 3 × 3 #> tension A B #> #> 1 L 46.9 28.2 #> 2 M 24 28.8 #> 3 H 24.6 18.8"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider_spec.html","id":null,"dir":"Reference","previous_headings":"","what":"Pivot data from long to wide using a spec — pivot_wider_spec","title":"Pivot data from long to wide using a spec — pivot_wider_spec","text":"low level interface pivoting, inspired cdata package, allows describe pivoting data frame.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider_spec.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pivot data from long to wide using a spec — pivot_wider_spec","text":"","code":"pivot_wider_spec( data, spec, ..., names_repair = \"check_unique\", id_cols = NULL, id_expand = FALSE, values_fill = NULL, values_fn = NULL, unused_fn = NULL, error_call = current_env() ) build_wider_spec( data, ..., names_from = name, values_from = value, names_prefix = \"\", names_sep = \"_\", names_glue = NULL, names_sort = FALSE, names_vary = \"fastest\", names_expand = FALSE, error_call = current_env() )"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider_spec.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pivot data from long to wide using a spec — pivot_wider_spec","text":"data data frame pivot. spec specification data frame. useful complex pivots gives greater control metadata stored columns become column names result. Must data frame containing character .name .value columns. Additional columns spec named match columns long format dataset contain values corresponding columns pivoted wide format. special .seq variable used disambiguate rows internally; automatically removed pivoting. ... dots future extensions must empty. names_repair happens output invalid column names? default, \"check_unique\" error columns duplicated. Use \"minimal\" allow duplicates output, \"unique\" de-duplicated adding numeric suffixes. See vctrs::vec_as_names() options. id_cols set columns uniquely identifies observation. Defaults columns data except columns specified spec$.value columns spec named .name .value. Typically used redundant variables, .e. variables whose values perfectly correlated existing variables. id_expand values id_cols columns expanded expand() pivoting? results rows, output contain complete expansion possible values id_cols. Implicit factor levels represented data become explicit. Additionally, row values corresponding expanded id_cols sorted. values_fill Optionally, (scalar) value specifies value filled missing. can named list want apply different fill values different value columns. values_fn Optionally, function applied value cell output. typically use combination id_cols names_from columns uniquely identify observation. can named list want apply different aggregations different values_from columns. unused_fn Optionally, function applied summarize values unused columns (.e. columns identified id_cols, names_from, values_from). default drops unused columns result. can named list want apply different aggregations different unused columns. id_cols must supplied unused_fn useful, since otherwise unspecified columns considered id_cols. similar grouping id_cols summarizing unused columns using unused_fn. error_call execution environment currently running function, e.g. caller_env(). function mentioned error messages source error. See call argument abort() information. names_from, values_from pair arguments describing column (columns) get name output column (names_from), column (columns) get cell values (values_from). values_from contains multiple values, value added front output column. names_prefix String added start every variable name. particularly useful names_from numeric vector want create syntactic variable names. names_sep names_from values_from contains multiple variables, used join values together single string use column name. names_glue Instead names_sep names_prefix, can supply glue specification uses names_from columns (special .value) create custom column names. names_sort column names sorted? FALSE, default, column names ordered first appearance. names_vary names_from identifies column (columns) multiple unique values, multiple values_from columns provided, order resulting column names combined? \"fastest\" varies names_from values fastest, resulting column naming scheme form: value1_name1, value1_name2, value2_name1, value2_name2. default. \"slowest\" varies names_from values slowest, resulting column naming scheme form: value1_name1, value2_name1, value1_name2, value2_name2. names_expand values names_from columns expanded expand() pivoting? results columns, output contain column names corresponding complete expansion possible values names_from. Implicit factor levels represented data become explicit. Additionally, column names sorted, identical names_sort produce.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider_spec.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Pivot data from long to wide using a spec — pivot_wider_spec","text":"","code":"# See vignette(\"pivot\") for examples and explanation us_rent_income #> # A tibble: 104 × 5 #> GEOID NAME variable estimate moe #> #> 1 01 Alabama income 24476 136 #> 2 01 Alabama rent 747 3 #> 3 02 Alaska income 32940 508 #> 4 02 Alaska rent 1200 13 #> 5 04 Arizona income 27517 148 #> 6 04 Arizona rent 972 4 #> 7 05 Arkansas income 23789 165 #> 8 05 Arkansas rent 709 5 #> 9 06 California income 29454 109 #> 10 06 California rent 1358 3 #> # ℹ 94 more rows spec1 <- us_rent_income %>% build_wider_spec(names_from = variable, values_from = c(estimate, moe)) spec1 #> # A tibble: 4 × 3 #> .name .value variable #> #> 1 estimate_income estimate income #> 2 estimate_rent estimate rent #> 3 moe_income moe income #> 4 moe_rent moe rent us_rent_income %>% pivot_wider_spec(spec1) #> # A tibble: 52 × 6 #> GEOID NAME estimate_income estimate_rent moe_income moe_rent #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Co… 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows # Is equivalent to us_rent_income %>% pivot_wider(names_from = variable, values_from = c(estimate, moe)) #> # A tibble: 52 × 6 #> GEOID NAME estimate_income estimate_rent moe_income moe_rent #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Co… 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows # `pivot_wider_spec()` provides more control over column names and output format # instead of creating columns with estimate_ and moe_ prefixes, # keep original variable name for estimates and attach _moe as suffix spec2 <- tibble( .name = c(\"income\", \"rent\", \"income_moe\", \"rent_moe\"), .value = c(\"estimate\", \"estimate\", \"moe\", \"moe\"), variable = c(\"income\", \"rent\", \"income\", \"rent\") ) us_rent_income %>% pivot_wider_spec(spec2) #> # A tibble: 52 × 6 #> GEOID NAME income rent income_moe rent_moe #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Columbia 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows"},{"path":"https://tidyr.tidyverse.org/dev/reference/reexports.html","id":null,"dir":"Reference","previous_headings":"","what":"Objects exported from other packages — reexports","title":"Objects exported from other packages — reexports","text":"objects imported packages. Follow links see documentation. tibble as_tibble, tibble, tribble tidyselect all_of, any_of, contains, ends_with, everything, last_col, matches, num_range, one_of, starts_with","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/relig_income.html","id":null,"dir":"Reference","previous_headings":"","what":"Pew religion and income survey — relig_income","title":"Pew religion and income survey — relig_income","text":"Pew religion income survey","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/relig_income.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pew religion and income survey — relig_income","text":"","code":"relig_income"},{"path":"https://tidyr.tidyverse.org/dev/reference/relig_income.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Pew religion and income survey — relig_income","text":"dataset variables: religion Name religion <$10k-Don\\'t know/refused Number respondees income range column name","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/relig_income.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Pew religion and income survey — relig_income","text":"Downloaded https://www.pewresearch.org/religious-landscape-study/database/ (downloaded November 2009)","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/replace_na.html","id":null,"dir":"Reference","previous_headings":"","what":"Replace NAs with specified values — replace_na","title":"Replace NAs with specified values — replace_na","text":"Replace NAs specified values","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/replace_na.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Replace NAs with specified values — replace_na","text":"","code":"replace_na(data, replace, ...)"},{"path":"https://tidyr.tidyverse.org/dev/reference/replace_na.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Replace NAs with specified values — replace_na","text":"data data frame vector. replace data data frame, replace takes named list values, one value column missing values replaced. value replace cast type column data used replacement . data vector, replace takes single value. single value replaces missing values vector. replace cast type data. ... Additional arguments methods. Currently unused.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/replace_na.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Replace NAs with specified values — replace_na","text":"replace_na() returns object type data.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/replace_na.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Replace NAs with specified values — replace_na","text":"","code":"# Replace NAs in a data frame df <- tibble(x = c(1, 2, NA), y = c(\"a\", NA, \"b\")) df %>% replace_na(list(x = 0, y = \"unknown\")) #> # A tibble: 3 × 2 #> x y #> #> 1 1 a #> 2 2 unknown #> 3 0 b # Replace NAs in a vector df %>% dplyr::mutate(x = replace_na(x, 0)) #> # A tibble: 3 × 2 #> x y #> #> 1 1 a #> 2 2 NA #> 3 0 b # OR df$x %>% replace_na(0) #> [1] 1 2 0 df$y %>% replace_na(\"unknown\") #> [1] \"a\" \"unknown\" \"b\" # Replace NULLs in a list: NULLs are the list-col equivalent of NAs df_list <- tibble(z = list(1:5, NULL, 10:20)) df_list %>% replace_na(list(z = list(5))) #> # A tibble: 3 × 1 #> z #> #> 1 #> 2 #> 3 "},{"path":"https://tidyr.tidyverse.org/dev/reference/separate.html","id":null,"dir":"Reference","previous_headings":"","what":"Separate a character column into multiple columns with a regular expression or numeric locations — separate","title":"Separate a character column into multiple columns with a regular expression or numeric locations — separate","text":"separate() superseded favour separate_wider_position() separate_wider_delim() two functions make two uses obvious, API polished, handling problems better. Superseded functions go away, receive critical bug fixes. Given either regular expression vector character positions, separate() turns single character column multiple columns.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Separate a character column into multiple columns with a regular expression or numeric locations — separate","text":"","code":"separate( data, col, into, sep = \"[^[:alnum:]]+\", remove = TRUE, convert = FALSE, extra = \"warn\", fill = \"warn\", ... )"},{"path":"https://tidyr.tidyverse.org/dev/reference/separate.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Separate a character column into multiple columns with a regular expression or numeric locations — separate","text":"data data frame. col Column expand. Names new variables create character vector. Use NA omit variable output. sep Separator columns. character, sep interpreted regular expression. default value regular expression matches sequence non-alphanumeric values. numeric, sep interpreted character positions split . Positive values start 1 far-left string; negative value start -1 far-right string. length sep one less . remove TRUE, remove input column output data frame. convert TRUE, run type.convert() .= TRUE new columns. useful component columns integer, numeric logical. NB: cause string \"NA\"s converted NAs. extra sep character vector, controls happens many pieces. three valid options: \"warn\" (default): emit warning drop extra values. \"drop\": drop extra values without warning. \"merge\": splits length() times fill sep character vector, controls happens enough pieces. three valid options: \"warn\" (default): emit warning fill right \"right\": fill missing values right \"left\": fill missing values left ... Additional arguments passed methods.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/separate.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Separate a character column into multiple columns with a regular expression or numeric locations — separate","text":"","code":"# If you want to split by any non-alphanumeric value (the default): df <- tibble(x = c(NA, \"x.y\", \"x.z\", \"y.z\")) df %>% separate(x, c(\"A\", \"B\")) #> # A tibble: 4 × 2 #> A B #> #> 1 NA NA #> 2 x y #> 3 x z #> 4 y z # If you just want the second variable: df %>% separate(x, c(NA, \"B\")) #> # A tibble: 4 × 1 #> B #> #> 1 NA #> 2 y #> 3 z #> 4 z # We now recommend separate_wider_delim() instead: df %>% separate_wider_delim(x, \".\", names = c(\"A\", \"B\")) #> # A tibble: 4 × 2 #> A B #> #> 1 NA NA #> 2 x y #> 3 x z #> 4 y z df %>% separate_wider_delim(x, \".\", names = c(NA, \"B\")) #> # A tibble: 4 × 1 #> B #> #> 1 NA #> 2 y #> 3 z #> 4 z # Controlling uneven splits ------------------------------------------------- # If every row doesn't split into the same number of pieces, use # the extra and fill arguments to control what happens: df <- tibble(x = c(\"x\", \"x y\", \"x y z\", NA)) df %>% separate(x, c(\"a\", \"b\")) #> Warning: Expected 2 pieces. Additional pieces discarded in 1 rows [3]. #> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [1]. #> # A tibble: 4 × 2 #> a b #> #> 1 x NA #> 2 x y #> 3 x y #> 4 NA NA # The same behaviour as previous, but drops the c without warnings: df %>% separate(x, c(\"a\", \"b\"), extra = \"drop\", fill = \"right\") #> # A tibble: 4 × 2 #> a b #> #> 1 x NA #> 2 x y #> 3 x y #> 4 NA NA # Opposite of previous, keeping the c and filling left: df %>% separate(x, c(\"a\", \"b\"), extra = \"merge\", fill = \"left\") #> # A tibble: 4 × 2 #> a b #> #> 1 NA x #> 2 x y #> 3 x y z #> 4 NA NA # Or you can keep all three: df %>% separate(x, c(\"a\", \"b\", \"c\")) #> Warning: Expected 3 pieces. Missing pieces filled with `NA` in 2 rows [1, 2]. #> # A tibble: 4 × 3 #> a b c #> #> 1 x NA NA #> 2 x y NA #> 3 x y z #> 4 NA NA NA # To only split a specified number of times use extra = \"merge\": df <- tibble(x = c(\"x: 123\", \"y: error: 7\")) df %>% separate(x, c(\"key\", \"value\"), \": \", extra = \"merge\") #> # A tibble: 2 × 2 #> key value #> #> 1 x 123 #> 2 y error: 7 # Controlling column types -------------------------------------------------- # convert = TRUE detects column classes: df <- tibble(x = c(\"x:1\", \"x:2\", \"y:4\", \"z\", NA)) df %>% separate(x, c(\"key\", \"value\"), \":\") %>% str() #> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [4]. #> tibble [5 × 2] (S3: tbl_df/tbl/data.frame) #> $ key : chr [1:5] \"x\" \"x\" \"y\" \"z\" ... #> $ value: chr [1:5] \"1\" \"2\" \"4\" NA ... df %>% separate(x, c(\"key\", \"value\"), \":\", convert = TRUE) %>% str() #> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [4]. #> tibble [5 × 2] (S3: tbl_df/tbl/data.frame) #> $ key : chr [1:5] \"x\" \"x\" \"y\" \"z\" ... #> $ value: int [1:5] 1 2 4 NA NA"},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_longer_delim.html","id":null,"dir":"Reference","previous_headings":"","what":"Split a string into rows — separate_longer_delim","title":"Split a string into rows — separate_longer_delim","text":"functions takes string splits multiple rows: separate_longer_delim() splits delimiter. separate_longer_position() splits fixed width.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_longer_delim.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Split a string into rows — separate_longer_delim","text":"","code":"separate_longer_delim(data, cols, delim, ...) separate_longer_position(data, cols, width, ..., keep_empty = FALSE)"},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_longer_delim.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Split a string into rows — separate_longer_delim","text":"data data frame. cols Columns separate. delim separate_longer_delim(), string giving delimiter values. default, interpreted fixed string; use stringr::regex() friends split ways. ... dots future extensions must empty. width separate_longer_position(), integer giving number characters split . keep_empty default, get ceiling(nchar(x) / width) rows observation. nchar(x) zero, means entire input row dropped output. want preserve rows, use keep_empty = TRUE replace size-0 elements missing value.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_longer_delim.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Split a string into rows — separate_longer_delim","text":"data frame based data. columns, different rows.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_longer_delim.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Split a string into rows — separate_longer_delim","text":"","code":"df <- tibble(id = 1:4, x = c(\"x\", \"x y\", \"x y z\", NA)) df %>% separate_longer_delim(x, delim = \" \") #> # A tibble: 7 × 2 #> id x #> #> 1 1 x #> 2 2 x #> 3 2 y #> 4 3 x #> 5 3 y #> 6 3 z #> 7 4 NA # You can separate multiple columns at once if they have the same structure df <- tibble(id = 1:3, x = c(\"x\", \"x y\", \"x y z\"), y = c(\"a\", \"a b\", \"a b c\")) df %>% separate_longer_delim(c(x, y), delim = \" \") #> # A tibble: 6 × 3 #> id x y #> #> 1 1 x a #> 2 2 x a #> 3 2 y b #> 4 3 x a #> 5 3 y b #> 6 3 z c # Or instead split by a fixed length df <- tibble(id = 1:3, x = c(\"ab\", \"def\", \"\")) df %>% separate_longer_position(x, 1) #> # A tibble: 5 × 2 #> id x #> #> 1 1 a #> 2 1 b #> 3 2 d #> 4 2 e #> 5 2 f df %>% separate_longer_position(x, 2) #> # A tibble: 3 × 2 #> id x #> #> 1 1 ab #> 2 2 de #> 3 2 f df %>% separate_longer_position(x, 2, keep_empty = TRUE) #> # A tibble: 4 × 2 #> id x #> #> 1 1 ab #> 2 2 de #> 3 2 f #> 4 3 NA"},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_rows.html","id":null,"dir":"Reference","previous_headings":"","what":"Separate a collapsed column into multiple rows — separate_rows","title":"Separate a collapsed column into multiple rows — separate_rows","text":"separate_rows() superseded favour separate_longer_delim() consistent API separate functions. Superseded functions go away, receive critical bug fixes. variable contains observations multiple delimited values, separate_rows() separates values places one row.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_rows.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Separate a collapsed column into multiple rows — separate_rows","text":"","code":"separate_rows(data, ..., sep = \"[^[:alnum:].]+\", convert = FALSE)"},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_rows.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Separate a collapsed column into multiple rows — separate_rows","text":"data data frame. ... Columns separate across multiple rows sep Separator delimiting collapsed values. convert TRUE automatically run type.convert() key column. useful column types actually numeric, integer, logical.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_rows.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Separate a collapsed column into multiple rows — separate_rows","text":"","code":"df <- tibble( x = 1:3, y = c(\"a\", \"d,e,f\", \"g,h\"), z = c(\"1\", \"2,3,4\", \"5,6\") ) separate_rows(df, y, z, convert = TRUE) #> # A tibble: 6 × 3 #> x y z #> #> 1 1 a 1 #> 2 2 d 2 #> 3 2 e 3 #> 4 2 f 4 #> 5 3 g 5 #> 6 3 h 6 # Now recommended df %>% separate_longer_delim(c(y, z), delim = \",\") #> # A tibble: 6 × 3 #> x y z #> #> 1 1 a 1 #> 2 2 d 2 #> 3 2 e 3 #> 4 2 f 4 #> 5 3 g 5 #> 6 3 h 6"},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_wider_delim.html","id":null,"dir":"Reference","previous_headings":"","what":"Split a string into columns — separate_wider_delim","title":"Split a string into columns — separate_wider_delim","text":"functions takes string column splits multiple new columns: separate_wider_delim() splits delimiter. separate_wider_position() splits fixed widths. separate_wider_regex() splits regular expression matches. functions equivalent separate() extract(), use stringr underlying string manipulation engine, interfaces reflect learned unnest_wider() unnest_longer().","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_wider_delim.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Split a string into columns — separate_wider_delim","text":"","code":"separate_wider_delim( data, cols, delim, ..., names = NULL, names_sep = NULL, names_repair = \"check_unique\", too_few = c(\"error\", \"debug\", \"align_start\", \"align_end\"), too_many = c(\"error\", \"debug\", \"drop\", \"merge\"), cols_remove = TRUE ) separate_wider_position( data, cols, widths, ..., names_sep = NULL, names_repair = \"check_unique\", too_few = c(\"error\", \"debug\", \"align_start\"), too_many = c(\"error\", \"debug\", \"drop\"), cols_remove = TRUE ) separate_wider_regex( data, cols, patterns, ..., names_sep = NULL, names_repair = \"check_unique\", too_few = c(\"error\", \"debug\", \"align_start\"), cols_remove = TRUE )"},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_wider_delim.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Split a string into columns — separate_wider_delim","text":"data data frame. cols Columns separate. delim separate_wider_delim(), string giving delimiter values. default, interpreted fixed string; use stringr::regex() friends split ways. ... dots future extensions must empty. names separate_wider_delim(), character vector output column names. Use NA components want appear output; number non-NA elements determines number new columns result. names_sep supplied, output names composed input column name followed separator followed new column name. Required cols selects multiple columns. separate_wider_delim() can specify instead names, case names generated source column name, names_sep, numeric suffix. names_repair Used check output data frame valid names. Must one following options: \"minimal\": name repair checks, beyond basic existence, \"unique\": make sure names unique empty, \"check_unique\": (default), name repair, check unique, \"universal\": make names unique syntactic function: apply custom name repair. tidyr_legacy: use name repair tidyr 0.8. formula: purrr-style anonymous function (see rlang::as_function()) See vctrs::vec_as_names() details terms strategies used enforce . too_few happen value separates pieces? \"error\", default, throw error. \"debug\" adds additional columns output help locate resolve underlying problem. option intended help debug issue address generally remain final code. \"align_start\" aligns starts short matches, adding NA end pad correct length. \"align_end\" (separate_wider_delim() ) aligns ends short matches, adding NA start pad correct length. too_many happen value separates many pieces? \"error\", default, throw error. \"debug\" add additional columns output help locate resolve underlying problem. \"drop\" silently drop extra pieces. \"merge\" (separate_wider_delim() ) merge together additional pieces. cols_remove input cols removed output? Always FALSE too_few too_many set \"debug\". widths named numeric vector names become column names, values specify column width. Unnamed components match, included output. patterns named character vector names become column names values regular expressions match contents vector. Unnamed components match, included output.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_wider_delim.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Split a string into columns — separate_wider_delim","text":"data frame based data. rows, different columns: primary purpose functions create new columns components string. separate_wider_delim() names new columns come names. separate_wider_position() names come names widths. separate_wider_regex() names come names patterns. too_few too_many \"debug\", output contain additional columns useful debugging: {col}_ok: logical vector tells input ok . Use quickly find problematic rows. {col}_remainder: text remaining separation. {col}_pieces, {col}_width, {col}_matches: number pieces, number characters, number matches separate_wider_delim(), separate_wider_position() separate_regexp_wider() respectively. cols_remove = TRUE (default), input cols removed output.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_wider_delim.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Split a string into columns — separate_wider_delim","text":"","code":"df <- tibble(id = 1:3, x = c(\"m-123\", \"f-455\", \"f-123\")) # There are three basic ways to split up a string into pieces: # 1. with a delimiter df %>% separate_wider_delim(x, delim = \"-\", names = c(\"gender\", \"unit\")) #> # A tibble: 3 × 3 #> id gender unit #> #> 1 1 m 123 #> 2 2 f 455 #> 3 3 f 123 # 2. by length df %>% separate_wider_position(x, c(gender = 1, 1, unit = 3)) #> # A tibble: 3 × 3 #> id gender unit #> #> 1 1 m 123 #> 2 2 f 455 #> 3 3 f 123 # 3. defining each component with a regular expression df %>% separate_wider_regex(x, c(gender = \".\", \".\", unit = \"\\\\d+\")) #> # A tibble: 3 × 3 #> id gender unit #> #> 1 1 m 123 #> 2 2 f 455 #> 3 3 f 123 # Sometimes you split on the \"last\" delimiter df <- tibble(var = c(\"race_1\", \"race_2\", \"age_bucket_1\", \"age_bucket_2\")) # _delim won't help because it always splits on the first delimiter try(df %>% separate_wider_delim(var, \"_\", names = c(\"var1\", \"var2\"))) #> Error in separate_wider_delim(., var, \"_\", names = c(\"var1\", \"var2\")) : #> Expected 2 pieces in each element of `var`. #> ! 2 values were too long. #> ℹ Use `too_many = \"debug\"` to diagnose the problem. #> ℹ Use `too_many = \"drop\"/\"merge\"` to silence this message. df %>% separate_wider_delim(var, \"_\", names = c(\"var1\", \"var2\"), too_many = \"merge\") #> # A tibble: 4 × 2 #> var1 var2 #> #> 1 race 1 #> 2 race 2 #> 3 age bucket_1 #> 4 age bucket_2 # Instead, you can use _regex df %>% separate_wider_regex(var, c(var1 = \".*\", \"_\", var2 = \".*\")) #> # A tibble: 4 × 2 #> var1 var2 #> #> 1 race 1 #> 2 race 2 #> 3 age_bucket 1 #> 4 age_bucket 2 # this works because * is greedy; you can mimic the _delim behaviour with .*? df %>% separate_wider_regex(var, c(var1 = \".*?\", \"_\", var2 = \".*\")) #> # A tibble: 4 × 2 #> var1 var2 #> #> 1 race 1 #> 2 race 2 #> 3 age bucket_1 #> 4 age bucket_2 # If the number of components varies, it's most natural to split into rows df <- tibble(id = 1:4, x = c(\"x\", \"x y\", \"x y z\", NA)) df %>% separate_longer_delim(x, delim = \" \") #> # A tibble: 7 × 2 #> id x #> #> 1 1 x #> 2 2 x #> 3 2 y #> 4 3 x #> 5 3 y #> 6 3 z #> 7 4 NA # But separate_wider_delim() provides some tools to deal with the problem # The default behaviour tells you that there's a problem try(df %>% separate_wider_delim(x, delim = \" \", names = c(\"a\", \"b\"))) #> Error in separate_wider_delim(., x, delim = \" \", names = c(\"a\", \"b\")) : #> Expected 2 pieces in each element of `x`. #> ! 1 value was too short. #> ℹ Use `too_few = \"debug\"` to diagnose the problem. #> ℹ Use `too_few = \"align_start\"/\"align_end\"` to silence this message. #> ! 1 value was too long. #> ℹ Use `too_many = \"debug\"` to diagnose the problem. #> ℹ Use `too_many = \"drop\"/\"merge\"` to silence this message. # You can get additional insight by using the debug options df %>% separate_wider_delim( x, delim = \" \", names = c(\"a\", \"b\"), too_few = \"debug\", too_many = \"debug\" ) #> Warning: Debug mode activated: adding variables `x_ok`, `x_pieces`, and #> `x_remainder`. #> # A tibble: 4 × 7 #> id a b x x_ok x_pieces x_remainder #> #> 1 1 x NA x FALSE 1 \"\" #> 2 2 x y x y TRUE 2 \"\" #> 3 3 x y x y z FALSE 3 \" z\" #> 4 4 NA NA NA TRUE NA NA # But you can suppress the warnings df %>% separate_wider_delim( x, delim = \" \", names = c(\"a\", \"b\"), too_few = \"align_start\", too_many = \"merge\" ) #> # A tibble: 4 × 3 #> id a b #> #> 1 1 x NA #> 2 2 x y #> 3 3 x y z #> 4 4 NA NA # Or choose to automatically name the columns, producing as many as needed df %>% separate_wider_delim(x, delim = \" \", names_sep = \"\", too_few = \"align_start\") #> # A tibble: 4 × 4 #> id x1 x2 x3 #> #> 1 1 x NA NA #> 2 2 x y NA #> 3 3 x y z #> 4 4 NA NA NA"},{"path":"https://tidyr.tidyverse.org/dev/reference/smiths.html","id":null,"dir":"Reference","previous_headings":"","what":"Some data about the Smith family — smiths","title":"Some data about the Smith family — smiths","text":"small demo dataset describing John Mary Smith.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/smiths.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Some data about the Smith family — smiths","text":"","code":"smiths"},{"path":"https://tidyr.tidyverse.org/dev/reference/smiths.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Some data about the Smith family — smiths","text":"data frame 2 rows 5 columns.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/spread.html","id":null,"dir":"Reference","previous_headings":"","what":"Spread a key-value pair across multiple columns — spread","title":"Spread a key-value pair across multiple columns — spread","text":"Development spread() complete, new code recommend switching pivot_wider(), easier use, featureful, still active development. df %>% spread(key, value) equivalent df %>% pivot_wider(names_from = key, values_from = value) See details vignette(\"pivot\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/spread.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Spread a key-value pair across multiple columns — spread","text":"","code":"spread(data, key, value, fill = NA, convert = FALSE, drop = TRUE, sep = NULL)"},{"path":"https://tidyr.tidyverse.org/dev/reference/spread.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Spread a key-value pair across multiple columns — spread","text":"data data frame. key, value Columns use key value. fill set, missing values replaced value. Note two types missingness input: explicit missing values (.e. NA), implicit missings, rows simply present. types missing value replaced fill. convert TRUE, type.convert() asis = TRUE run new columns. useful value column mix variables coerced string. class value column factor date, note true new columns produced, coerced character type conversion. drop FALSE, keep factor levels appear data, filling missing combinations fill. sep NULL, column names taken values key variable. non-NULL, column names given \"\".","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/spread.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Spread a key-value pair across multiple columns — spread","text":"","code":"stocks <- tibble( time = as.Date(\"2009-01-01\") + 0:9, X = rnorm(10, 0, 1), Y = rnorm(10, 0, 2), Z = rnorm(10, 0, 4) ) stocksm <- stocks %>% gather(stock, price, -time) stocksm %>% spread(stock, price) #> # A tibble: 10 × 4 #> time X Y Z #> #> 1 2009-01-01 -2.05 -1.40 0.00192 #> 2 2009-01-02 0.151 1.95 3.02 #> 3 2009-01-03 -0.293 -0.154 1.37 #> 4 2009-01-04 0.255 1.79 0.674 #> 5 2009-01-05 -0.553 -1.56 5.59 #> 6 2009-01-06 1.41 0.874 -2.72 #> 7 2009-01-07 -0.795 0.827 2.95 #> 8 2009-01-08 -1.57 1.95 -3.44 #> 9 2009-01-09 -1.04 2.29 1.68 #> 10 2009-01-10 1.02 2.43 5.80 stocksm %>% spread(time, price) #> # A tibble: 3 × 11 #> stock `2009-01-01` `2009-01-02` `2009-01-03` `2009-01-04` `2009-01-05` #> #> 1 X -2.05 0.151 -0.293 0.255 -0.553 #> 2 Y -1.40 1.95 -0.154 1.79 -1.56 #> 3 Z 0.00192 3.02 1.37 0.674 5.59 #> # ℹ 5 more variables: `2009-01-06` , `2009-01-07` , #> # `2009-01-08` , `2009-01-09` , `2009-01-10` # Spread and gather are complements df <- tibble(x = c(\"a\", \"b\"), y = c(3, 4), z = c(5, 6)) df %>% spread(x, y) %>% gather(\"x\", \"y\", a:b, na.rm = TRUE) #> # A tibble: 2 × 3 #> z x y #> #> 1 5 a 3 #> 2 6 b 4 # Use 'convert = TRUE' to produce variables of mixed type df <- tibble( row = rep(c(1, 51), each = 3), var = rep(c(\"Sepal.Length\", \"Species\", \"Species_num\"), 2), value = c(5.1, \"setosa\", 1, 7.0, \"versicolor\", 2) ) df %>% spread(var, value) %>% str() #> tibble [2 × 4] (S3: tbl_df/tbl/data.frame) #> $ row : num [1:2] 1 51 #> $ Sepal.Length: chr [1:2] \"5.1\" \"7\" #> $ Species : chr [1:2] \"setosa\" \"versicolor\" #> $ Species_num : chr [1:2] \"1\" \"2\" df %>% spread(var, value, convert = TRUE) %>% str() #> tibble [2 × 4] (S3: tbl_df/tbl/data.frame) #> $ row : num [1:2] 1 51 #> $ Sepal.Length: num [1:2] 5.1 7 #> $ Species : chr [1:2] \"setosa\" \"versicolor\" #> $ Species_num : int [1:2] 1 2"},{"path":"https://tidyr.tidyverse.org/dev/reference/table1.html","id":null,"dir":"Reference","previous_headings":"","what":"Example tabular representations — table1","title":"Example tabular representations — table1","text":"Data sets demonstrate multiple ways layout tabular data.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/table1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Example tabular representations — table1","text":"","code":"table1 table2 table3 table4a table4b table5"},{"path":"https://tidyr.tidyverse.org/dev/reference/table1.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Example tabular representations — table1","text":"https://www..int/teams/global-tuberculosis-programme/data","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/table1.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Example tabular representations — table1","text":"table1, table2, table3, table4a, table4b, table5 display number TB cases documented World Health Organization Afghanistan, Brazil, China 1999 2000. data contains values associated four variables (country, year, cases, population), table organizes values different layout. data subset data contained World Health Organization Global Tuberculosis Report","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr-package.html","id":null,"dir":"Reference","previous_headings":"","what":"tidyr: Tidy Messy Data — tidyr-package","title":"tidyr: Tidy Messy Data — tidyr-package","text":"Tools help create tidy data, column variable, row observation, cell contains single value. 'tidyr' contains tools changing shape (pivoting) hierarchy (nesting 'unnesting') dataset, turning deeply nested lists rectangular data frames ('rectangling'), extracting values string columns. also includes tools working missing values (implicit explicit).","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"tidyr: Tidy Messy Data — tidyr-package","text":"Maintainer: Hadley Wickham hadley@posit.co Authors: Davis Vaughan davis@posit.co Maximilian Girlich contributors: Kevin Ushey kevin@posit.co [contributor] Posit Software, PBC [copyright holder, funder]","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_data_masking.html","id":null,"dir":"Reference","previous_headings":"","what":"Argument type: data-masking — tidyr_data_masking","title":"Argument type: data-masking — tidyr_data_masking","text":"page describes argument modifier indicates argument uses data masking, sub-type tidy evaluation. never heard tidy evaluation , start practical introduction https://r4ds.hadley.nz/functions.html#data-frame-functions read underlying theory https://rlang.r-lib.org/reference/topic-data-mask.html.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_data_masking.html","id":"key-techniques","dir":"Reference","previous_headings":"","what":"Key techniques","title":"Argument type: data-masking — tidyr_data_masking","text":"allow user supply column name function argument, embrace argument, e.g. filter(df, {{ var }}). work column name recorded string, use .data pronoun, e.g. summarise(df, mean = mean(.data[[var]])). suppress R CMD check NOTEs unknown variables use .data$var instead var: also need import .data rlang (e.g.) @importFrom rlang .data.","code":"dist_summary <- function(df, var) { df %>% summarise(n = n(), min = min({{ var }}), max = max({{ var }})) } mtcars %>% dist_summary(mpg) mtcars %>% group_by(cyl) %>% dist_summary(mpg) for (var in names(mtcars)) { mtcars %>% count(.data[[var]]) %>% print() } lapply(names(mtcars), function(var) mtcars %>% count(.data[[var]])) # has NOTE df %>% mutate(z = x + y) # no NOTE df %>% mutate(z = .data$x + .data$y)"},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_data_masking.html","id":"dot-dot-dot-","dir":"Reference","previous_headings":"","what":"Dot-dot-dot (...)","title":"Argument type: data-masking — tidyr_data_masking","text":"... automatically provides indirection, can use (.e. without embracing) inside function: can also use := instead = enable glue-like syntax creating variables user supplied data: Learn https://rlang.r-lib.org/reference/topic-data-mask-programming.html.","code":"grouped_mean <- function(df, var, ...) { df %>% group_by(...) %>% summarise(mean = mean({{ var }})) } var_name <- \"l100km\" mtcars %>% mutate(\"{var_name}\" := 235 / mpg) summarise_mean <- function(df, var) { df %>% summarise(\"mean_of_{{var}}\" := mean({{ var }})) } mtcars %>% group_by(cyl) %>% summarise_mean(mpg)"},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_legacy.html","id":null,"dir":"Reference","previous_headings":"","what":"Legacy name repair — tidyr_legacy","title":"Legacy name repair — tidyr_legacy","text":"Ensures column names unique using approach found tidyr 0.8.3 earlier. use function want preserve naming strategy, otherwise better adopting new tidyverse standard name_repair = \"universal\"","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_legacy.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Legacy name repair — tidyr_legacy","text":"","code":"tidyr_legacy(nms, prefix = \"V\", sep = \"\")"},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_legacy.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Legacy name repair — tidyr_legacy","text":"nms Character vector names prefix prefix Prefix use unnamed column sep Separator use name unique suffix","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_legacy.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Legacy name repair — tidyr_legacy","text":"","code":"df <- tibble(x = 1:2, y = list(tibble(x = 3:5), tibble(x = 4:7))) # Doesn't work because it would produce a data frame with two # columns called x if (FALSE) { # \\dontrun{ unnest(df, y) } # } # The new tidyverse standard: unnest(df, y, names_repair = \"universal\") #> New names: #> • `x` -> `x...1` #> • `x` -> `x...2` #> # A tibble: 7 × 2 #> x...1 x...2 #> #> 1 1 3 #> 2 1 4 #> 3 1 5 #> 4 2 4 #> 5 2 5 #> 6 2 6 #> 7 2 7 # The old tidyr approach unnest(df, y, names_repair = tidyr_legacy) #> # A tibble: 7 × 2 #> x x1 #> #> 1 1 3 #> 2 1 4 #> 3 1 5 #> 4 2 4 #> 5 2 5 #> 6 2 6 #> 7 2 7"},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_tidy_select.html","id":null,"dir":"Reference","previous_headings":"","what":"Argument type: tidy-select — tidyr_tidy_select","title":"Argument type: tidy-select — tidyr_tidy_select","text":"page describes argument modifier indicates argument uses tidy selection, sub-type tidy evaluation. never heard tidy evaluation , start practical introduction https://r4ds.hadley.nz/functions.html#data-frame-functions read underlying theory https://rlang.r-lib.org/reference/topic-data-mask.html.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_tidy_select.html","id":"overview-of-selection-features","dir":"Reference","previous_headings":"","what":"Overview of selection features","title":"Argument type: tidy-select — tidyr_tidy_select","text":"tidyselect implements DSL selecting variables. provides helpers selecting variables: var1:var10: variables lying var1 left var10 right. starts_with(\"\"): names start \"\". ends_with(\"z\"): names end \"z\". contains(\"b\"): names contain \"b\". matches(\"x.y\"): names match regular expression x.y. num_range(x, 1:4): names following pattern, x1, x2, ..., x4. all_of(vars)/any_of(vars): matches names stored character vector vars. all_of(vars) error variables present; any_of(var) match just variables exist. everything(): variables. last_col(): furthest column right. (.numeric): variables .numeric() returns TRUE. well operators combining selections: !selection: variables match selection. selection1 & selection2: variables included selection1 selection2. selection1 | selection2: variables match either selection1 selection2.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_tidy_select.html","id":"key-techniques","dir":"Reference","previous_headings":"","what":"Key techniques","title":"Argument type: tidy-select — tidyr_tidy_select","text":"want user supply tidyselect specification function argument, need tunnel selection function argument. done embracing function argument {{ }}, e.g unnest(df, {{ vars }}). character vector column names, use all_of() any_of(), depending whether want unknown variable names cause error, e.g unnest(df, all_of(vars)), unnest(df, !any_of(vars)). suppress R CMD check NOTEs unknown variables use \"var\" instead var:","code":"# has NOTE df %>% select(x, y, z) # no NOTE df %>% select(\"x\", \"y\", \"z\")"},{"path":"https://tidyr.tidyverse.org/dev/reference/uncount.html","id":null,"dir":"Reference","previous_headings":"","what":"","title":"","text":"Performs opposite operation dplyr::count(), duplicating rows according weighting variable (expression).","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/uncount.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"","text":"","code":"uncount(data, weights, ..., .remove = TRUE, .id = NULL)"},{"path":"https://tidyr.tidyverse.org/dev/reference/uncount.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"","text":"data data frame, tibble, grouped tibble. weights vector weights. Evaluated context data; supports quasiquotation. ... Additional arguments passed methods. .remove TRUE, weights name column data, column removed. .id Supply string create new variable gives unique identifier created row.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/uncount.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"","text":"","code":"df <- tibble(x = c(\"a\", \"b\"), n = c(1, 2)) uncount(df, n) #> # A tibble: 3 × 1 #> x #> #> 1 a #> 2 b #> 3 b uncount(df, n, .id = \"id\") #> # A tibble: 3 × 2 #> x id #> #> 1 a 1 #> 2 b 1 #> 3 b 2 # You can also use constants uncount(df, 2) #> # A tibble: 4 × 2 #> x n #> #> 1 a 1 #> 2 a 1 #> 3 b 2 #> 4 b 2 # Or expressions uncount(df, 2 / n) #> # A tibble: 3 × 2 #> x n #> #> 1 a 1 #> 2 a 1 #> 3 b 2"},{"path":"https://tidyr.tidyverse.org/dev/reference/unite.html","id":null,"dir":"Reference","previous_headings":"","what":"Unite multiple columns into one by pasting strings together — unite","title":"Unite multiple columns into one by pasting strings together — unite","text":"Convenience function paste together multiple columns one.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/unite.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Unite multiple columns into one by pasting strings together — unite","text":"","code":"unite(data, col, ..., sep = \"_\", remove = TRUE, na.rm = FALSE)"},{"path":"https://tidyr.tidyverse.org/dev/reference/unite.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Unite multiple columns into one by pasting strings together — unite","text":"data data frame. col name new column, string symbol. argument passed expression supports quasiquotation (can unquote strings symbols). name captured expression rlang::ensym() (note kind interface symbols represent actual objects now discouraged tidyverse; support backward compatibility). ... Columns unite sep Separator use values. remove TRUE, remove input columns output data frame. na.rm TRUE, missing values removed prior uniting value.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/unite.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Unite multiple columns into one by pasting strings together — unite","text":"","code":"df <- expand_grid(x = c(\"a\", NA), y = c(\"b\", NA)) df #> # A tibble: 4 × 2 #> x y #> #> 1 a b #> 2 a NA #> 3 NA b #> 4 NA NA df %>% unite(\"z\", x:y, remove = FALSE) #> # A tibble: 4 × 3 #> z x y #> #> 1 a_b a b #> 2 a_NA a NA #> 3 NA_b NA b #> 4 NA_NA NA NA # To remove missing values: df %>% unite(\"z\", x:y, na.rm = TRUE, remove = FALSE) #> # A tibble: 4 × 3 #> z x y #> #> 1 \"a_b\" a b #> 2 \"a\" a NA #> 3 \"b\" NA b #> 4 \"\" NA NA # Separate is almost the complement of unite df %>% unite(\"xy\", x:y) %>% separate(xy, c(\"x\", \"y\")) #> # A tibble: 4 × 2 #> x y #> #> 1 a b #> 2 a NA #> 3 NA b #> 4 NA NA # (but note `x` and `y` contain now \"NA\" not NA)"},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest.html","id":null,"dir":"Reference","previous_headings":"","what":"Unnest a list-column of data frames into rows and columns — unnest","title":"Unnest a list-column of data frames into rows and columns — unnest","text":"Unnest expands list-column containing data frames rows columns.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Unnest a list-column of data frames into rows and columns — unnest","text":"","code":"unnest( data, cols, ..., keep_empty = FALSE, ptype = NULL, names_sep = NULL, names_repair = \"check_unique\", .drop = deprecated(), .id = deprecated(), .sep = deprecated(), .preserve = deprecated() )"},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Unnest a list-column of data frames into rows and columns — unnest","text":"data data frame. cols List-columns unnest. selecting multiple columns, values row recycled common size. ... : previously write df %>% unnest(x, y, z). Convert df %>% unnest(c(x, y, z)). previously created new variable unnest() now need explicitly mutate(). Convert df %>% unnest(y = fun(x, y, z)) df %>% mutate(y = fun(x, y, z)) %>% unnest(y). keep_empty default, get one row output element list unchopping/unnesting. means size-0 element (like NULL empty data frame vector), entire row dropped output. want preserve rows, use keep_empty = TRUE replace size-0 elements single row missing values. ptype Optionally, named list column name-prototype pairs coerce cols , overriding default guessed combining individual values. Alternatively, single empty ptype can supplied, applied cols. names_sep NULL, default, outer names come inner names. string, outer names formed pasting together outer inner column names, separated names_sep. names_repair Used check output data frame valid names. Must one following options: \"minimal\": name repair checks, beyond basic existence, \"unique\": make sure names unique empty, \"check_unique\": (default), name repair, check unique, \"universal\": make names unique syntactic function: apply custom name repair. tidyr_legacy: use name repair tidyr 0.8. formula: purrr-style anonymous function (see rlang::as_function()) See vctrs::vec_as_names() details terms strategies used enforce . .drop, .preserve : list-columns now preserved; want output use select() remove prior unnesting. .id : convert df %>% unnest(x, .id = \"id\") df %>% mutate(id = names(x)) %>% unnest(x)). .sep : use names_sep instead.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest.html","id":"new-syntax","dir":"Reference","previous_headings":"","what":"New syntax","title":"Unnest a list-column of data frames into rows and columns — unnest","text":"tidyr 1.0.0 introduced new syntax nest() unnest() designed similar functions. Converting new syntax straightforward (guided message receive) just need run old analysis, can easily revert previous behaviour using nest_legacy() unnest_legacy() follows:","code":"library(tidyr) nest <- nest_legacy unnest <- unnest_legacy"},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Unnest a list-column of data frames into rows and columns — unnest","text":"","code":"# unnest() is designed to work with lists of data frames df <- tibble( x = 1:3, y = list( NULL, tibble(a = 1, b = 2), tibble(a = 1:3, b = 3:1, c = 4) ) ) # unnest() recycles input rows for each row of the list-column # and adds a column for each column df %>% unnest(y) #> # A tibble: 4 × 4 #> x a b c #> #> 1 2 1 2 NA #> 2 3 1 3 4 #> 3 3 2 2 4 #> 4 3 3 1 4 # input rows with 0 rows in the list-column will usually disappear, # but you can keep them (generating NAs) with keep_empty = TRUE: df %>% unnest(y, keep_empty = TRUE) #> # A tibble: 5 × 4 #> x a b c #> #> 1 1 NA NA NA #> 2 2 1 2 NA #> 3 3 1 3 4 #> 4 3 2 2 4 #> 5 3 3 1 4 # Multiple columns ---------------------------------------------------------- # You can unnest multiple columns simultaneously df <- tibble( x = 1:2, y = list( tibble(a = 1, b = 2), tibble(a = 3:4, b = 5:6) ), z = list( tibble(c = 1, d = 2), tibble(c = 3:4, d = 5:6) ) ) df %>% unnest(c(y, z)) #> # A tibble: 3 × 5 #> x a b c d #> #> 1 1 1 2 1 2 #> 2 2 3 5 3 5 #> 3 2 4 6 4 6 # Compare with unnesting one column at a time, which generates # the Cartesian product df %>% unnest(y) %>% unnest(z) #> # A tibble: 5 × 5 #> x a b c d #> #> 1 1 1 2 1 2 #> 2 2 3 5 3 5 #> 3 2 3 5 4 6 #> 4 2 4 6 3 5 #> 5 2 4 6 4 6"},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_auto.html","id":null,"dir":"Reference","previous_headings":"","what":"Automatically call unnest_wider() or unnest_longer() — unnest_auto","title":"Automatically call unnest_wider() or unnest_longer() — unnest_auto","text":"unnest_auto() picks unnest_wider() unnest_longer() inspecting inner names list-col: elements unnamed, uses unnest_longer(indices_include = FALSE). elements named, least one name common across components, uses unnest_wider(). Otherwise, falls back unnest_longer(indices_include = TRUE). handy rapid interactive exploration recommend using scripts, succeed even underlying data radically changes.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_auto.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Automatically call unnest_wider() or unnest_longer() — unnest_auto","text":"","code":"unnest_auto(data, col)"},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_auto.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Automatically call unnest_wider() or unnest_longer() — unnest_auto","text":"data data frame. col List-column unnest.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_longer.html","id":null,"dir":"Reference","previous_headings":"","what":"Unnest a list-column into rows — unnest_longer","title":"Unnest a list-column into rows — unnest_longer","text":"unnest_longer() turns element list-column row. naturally suited list-columns elements unnamed length element varies row row. unnest_longer() generally preserves number columns x modifying number rows. Learn vignette(\"rectangle\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_longer.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Unnest a list-column into rows — unnest_longer","text":"","code":"unnest_longer( data, col, values_to = NULL, indices_to = NULL, indices_include = NULL, keep_empty = FALSE, names_repair = \"check_unique\", simplify = TRUE, ptype = NULL, transform = NULL )"},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_longer.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Unnest a list-column into rows — unnest_longer","text":"data data frame. col List-column(s) unnest. selecting multiple columns, values row recycled common size. values_to string giving column name (names) store unnested values . multiple columns specified col, can also glue string containing \"{col}\" provide template column names. default, NULL, gives output columns names input columns. indices_to string giving column name (names) store inner names positions (named) values. multiple columns specified col, can also glue string containing \"{col}\" provide template column names. default, NULL, gives output columns names values_to, suffixed \"_id\". indices_include single logical value specifying whether add index column. value inner names, index column character vector names, otherwise integer vector positions. NULL, defaults TRUE value inner names indices_to provided. indices_to provided, indices_include FALSE. keep_empty default, get one row output element list unchopping/unnesting. means size-0 element (like NULL empty data frame vector), entire row dropped output. want preserve rows, use keep_empty = TRUE replace size-0 elements single row missing values. names_repair Used check output data frame valid names. Must one following options: \"minimal\": name repair checks, beyond basic existence, \"unique\": make sure names unique empty, \"check_unique\": (default), name repair, check unique, \"universal\": make names unique syntactic function: apply custom name repair. tidyr_legacy: use name repair tidyr 0.8. formula: purrr-style anonymous function (see rlang::as_function()) See vctrs::vec_as_names() details terms strategies used enforce . simplify TRUE, attempt simplify lists length-1 vectors atomic vector. Can also named list containing TRUE FALSE declaring whether attempt simplify particular column. named list provided, default unspecified columns TRUE. ptype Optionally, named list prototypes declaring desired output type component. Alternatively, single empty prototype can supplied, applied components. Use argument want check element type expect simplifying. ptype specified, simplify = FALSE simplification possible, list-column returned element type ptype. transform Optionally, named list transformation functions applied component. Alternatively, single function can supplied, applied components. Use argument want transform parse individual elements extracted. ptype transform supplied, transform applied ptype.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_longer.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Unnest a list-column into rows — unnest_longer","text":"","code":"# `unnest_longer()` is useful when each component of the list should # form a row df <- tibble( x = 1:4, y = list(NULL, 1:3, 4:5, integer()) ) df %>% unnest_longer(y) #> # A tibble: 5 × 2 #> x y #> #> 1 2 1 #> 2 2 2 #> 3 2 3 #> 4 3 4 #> 5 3 5 # Note that empty values like `NULL` and `integer()` are dropped by # default. If you'd like to keep them, set `keep_empty = TRUE`. df %>% unnest_longer(y, keep_empty = TRUE) #> # A tibble: 7 × 2 #> x y #> #> 1 1 NA #> 2 2 1 #> 3 2 2 #> 4 2 3 #> 5 3 4 #> 6 3 5 #> 7 4 NA # If the inner vectors are named, the names are copied to an `_id` column df <- tibble( x = 1:2, y = list(c(a = 1, b = 2), c(a = 10, b = 11, c = 12)) ) df %>% unnest_longer(y) #> # A tibble: 5 × 3 #> x y y_id #> #> 1 1 1 a #> 2 1 2 b #> 3 2 10 a #> 4 2 11 b #> 5 2 12 c # Multiple columns ---------------------------------------------------------- # If columns are aligned, you can unnest simultaneously df <- tibble( x = 1:2, y = list(1:2, 3:4), z = list(5:6, 7:8) ) df %>% unnest_longer(c(y, z)) #> # A tibble: 4 × 3 #> x y z #> #> 1 1 1 5 #> 2 1 2 6 #> 3 2 3 7 #> 4 2 4 8 # This is important because sequential unnesting would generate the # Cartesian product of the rows df %>% unnest_longer(y) %>% unnest_longer(z) #> # A tibble: 8 × 3 #> x y z #> #> 1 1 1 5 #> 2 1 1 6 #> 3 1 2 5 #> 4 1 2 6 #> 5 2 3 7 #> 6 2 3 8 #> 7 2 4 7 #> 8 2 4 8"},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_wider.html","id":null,"dir":"Reference","previous_headings":"","what":"Unnest a list-column into columns — unnest_wider","title":"Unnest a list-column into columns — unnest_wider","text":"unnest_wider() turns element list-column column. naturally suited list-columns every element named, names consistent row--row. unnest_wider() preserves rows x modifying columns. Learn vignette(\"rectangle\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_wider.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Unnest a list-column into columns — unnest_wider","text":"","code":"unnest_wider( data, col, names_sep = NULL, simplify = TRUE, strict = FALSE, names_repair = \"check_unique\", ptype = NULL, transform = NULL )"},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_wider.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Unnest a list-column into columns — unnest_wider","text":"data data frame. col List-column(s) unnest. selecting multiple columns, values row recycled common size. names_sep NULL, default, names left . string, outer inner names pasted together using names_sep separator. values unnested unnamed, names_sep must supplied, otherwise error thrown. names_sep supplied, names automatically generated unnamed values increasing sequence integers. simplify TRUE, attempt simplify lists length-1 vectors atomic vector. Can also named list containing TRUE FALSE declaring whether attempt simplify particular column. named list provided, default unspecified columns TRUE. strict single logical specifying whether apply strict vctrs typing rules. FALSE, typed empty values (like list() integer()) nested within list-columns treated like NULL contribute type unnested column. useful working JSON, empty values tend lose type information show list(). names_repair Used check output data frame valid names. Must one following options: \"minimal\": name repair checks, beyond basic existence, \"unique\": make sure names unique empty, \"check_unique\": (default), name repair, check unique, \"universal\": make names unique syntactic function: apply custom name repair. tidyr_legacy: use name repair tidyr 0.8. formula: purrr-style anonymous function (see rlang::as_function()) See vctrs::vec_as_names() details terms strategies used enforce . ptype Optionally, named list prototypes declaring desired output type component. Alternatively, single empty prototype can supplied, applied components. Use argument want check element type expect simplifying. ptype specified, simplify = FALSE simplification possible, list-column returned element type ptype. transform Optionally, named list transformation functions applied component. Alternatively, single function can supplied, applied components. Use argument want transform parse individual elements extracted. ptype transform supplied, transform applied ptype.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_wider.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Unnest a list-column into columns — unnest_wider","text":"","code":"df <- tibble( character = c(\"Toothless\", \"Dory\"), metadata = list( list( species = \"dragon\", color = \"black\", films = c( \"How to Train Your Dragon\", \"How to Train Your Dragon 2\", \"How to Train Your Dragon: The Hidden World\" ) ), list( species = \"blue tang\", color = \"blue\", films = c(\"Finding Nemo\", \"Finding Dory\") ) ) ) df #> # A tibble: 2 × 2 #> character metadata #> #> 1 Toothless #> 2 Dory # Turn all components of metadata into columns df %>% unnest_wider(metadata) #> # A tibble: 2 × 4 #> character species color films #> #> 1 Toothless dragon black #> 2 Dory blue tang blue # Choose not to simplify list-cols of length-1 elements df %>% unnest_wider(metadata, simplify = FALSE) #> # A tibble: 2 × 4 #> character species color films #> #> 1 Toothless #> 2 Dory df %>% unnest_wider(metadata, simplify = list(color = FALSE)) #> # A tibble: 2 × 4 #> character species color films #> #> 1 Toothless dragon #> 2 Dory blue tang # You can also widen unnamed list-cols: df <- tibble( x = 1:3, y = list(NULL, 1:3, 4:5) ) # but you must supply `names_sep` to do so, which generates automatic names: df %>% unnest_wider(y, names_sep = \"_\") #> # A tibble: 3 × 4 #> x y_1 y_2 y_3 #> #> 1 1 NA NA NA #> 2 2 1 2 3 #> 3 3 4 5 NA # 0-length elements --------------------------------------------------------- # The defaults of `unnest_wider()` treat empty types (like `list()`) as `NULL`. json <- list( list(x = 1:2, y = 1:2), list(x = list(), y = 3:4), list(x = 3L, y = list()) ) df <- tibble(json = json) df %>% unnest_wider(json) #> # A tibble: 3 × 2 #> x y #> #> 1 #> 2 #> 3 # To instead enforce strict vctrs typing rules, use `strict` df %>% unnest_wider(json, strict = TRUE) #> # A tibble: 3 × 2 #> x y #> #> 1 #> 2 #> 3 "},{"path":"https://tidyr.tidyverse.org/dev/reference/us_rent_income.html","id":null,"dir":"Reference","previous_headings":"","what":"US rent and income data — us_rent_income","title":"US rent and income data — us_rent_income","text":"Captured 2017 American Community Survey using tidycensus package.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/us_rent_income.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"US rent and income data — us_rent_income","text":"","code":"us_rent_income"},{"path":"https://tidyr.tidyverse.org/dev/reference/us_rent_income.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"US rent and income data — us_rent_income","text":"dataset variables: GEOID FIP state identifier NAME Name state variable Variable name: income = median yearly income, rent = median monthly rent estimate Estimated value moe 90% margin error","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/who.html","id":null,"dir":"Reference","previous_headings":"","what":"World Health Organization TB data — who","title":"World Health Organization TB data — who","text":"subset data World Health Organization Global Tuberculosis Report, accompanying global populations. uses original codes World Health Organization. column names columns 5 60 made combining new_ : method diagnosis (rel = relapse, sn = negative pulmonary smear, sp = positive pulmonary smear, ep = extrapulmonary), gender (f = female, m = male), age group (014 = 0-14 yrs age, 1524 = 15-24, 2534 = 25-34, 3544 = 35-44 years age, 4554 = 45-54, 5564 = 55-64, 65 = 65 years older). who2 lightly modified version makes teaching basics easier tweaking variables slightly consistent dropping iso2 iso3. newrel replaced new_rel, _ added gender.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/who.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"World Health Organization TB data — who","text":"","code":"who who2 population"},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/who.html","id":"who","dir":"Reference","previous_headings":"","what":"who","title":"World Health Organization TB data — who","text":"data frame 7,240 rows 60 columns: country Country name iso2, iso3 2 & 3 letter ISO country codes year Year new_sp_m014 - new_rel_f65 Counts new TB cases recorded group. Column names encode three variables describe group.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/who.html","id":"who-","dir":"Reference","previous_headings":"","what":"who2","title":"World Health Organization TB data — who","text":"data frame 7,240 rows 58 columns.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/who.html","id":"population","dir":"Reference","previous_headings":"","what":"population","title":"World Health Organization TB data — who","text":"data frame 4,060 rows three columns: country Country name year Year population Population","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/who.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"World Health Organization TB data — who","text":"https://www..int/teams/global-tuberculosis-programme/data","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/world_bank_pop.html","id":null,"dir":"Reference","previous_headings":"","what":"Population data from the World Bank — world_bank_pop","title":"Population data from the World Bank — world_bank_pop","text":"Data population World Bank.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/world_bank_pop.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Population data from the World Bank — world_bank_pop","text":"","code":"world_bank_pop"},{"path":"https://tidyr.tidyverse.org/dev/reference/world_bank_pop.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Population data from the World Bank — world_bank_pop","text":"dataset variables: country Three letter country code indicator Indicator name: SP.POP.GROW = population growth, SP.POP.TOTL = total population, SP.URB.GROW = urban population growth, SP.URB.TOTL = total urban population 2000-2018 Value year","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/world_bank_pop.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Population data from the World Bank — world_bank_pop","text":"Dataset World Bank data bank: https://data.worldbank.org","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-development-version","dir":"Changelog","previous_headings":"","what":"tidyr (development version)","title":"tidyr (development version)","text":"tidyr now requires dplyr >=1.1.0 (#1568, @catalamarti).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-131","dir":"Changelog","previous_headings":"","what":"tidyr 1.3.1","title":"tidyr 1.3.1","text":"CRAN release: 2024-01-24 pivot_wider now uses .|> syntax dplyr helper message identify duplicates (@boshek, #1516)","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-130","dir":"Changelog","previous_headings":"","what":"tidyr 1.3.0","title":"tidyr 1.3.0","text":"CRAN release: 2023-01-24","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"new-features-1-3-0","dir":"Changelog","previous_headings":"","what":"New features","title":"tidyr 1.3.0","text":"New family consistent string separating functions: separate_wider_delim(), separate_wider_position(), separate_wider_regex(), separate_longer_delim(), separate_longer_position(). functions thorough refreshes separate() extract(), featuring improved performance, greater consistency, polished API, new approach handling problems. use stringr supersede extract(), separate(), separate_rows() (#1304). named character vector interface used separate_wider_regex() similar nc package Toby Dylan Hocking. nest() gains .argument allows specify columns nest (rather columns nest, .e. ...). Additionally, .key argument longer deprecated, used whenever ... isn’t specified (#1458). unnest_longer() gains keep_empty argument like unnest() (#1339). pivot_longer() gains cols_vary argument controlling ordering output rows relative original row number (#1312). New datasets who2, household, cms_patient_experience, cms_patient_care demonstrate various tidying challenges (#1333).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"breaking-changes-1-3-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"tidyr 1.3.0","text":"... argument pivot_longer() pivot_wider() moved front function signature, required arguments optional ones. Additionally, pivot_longer_spec(), pivot_wider_spec(), build_longer_spec(), build_wider_spec() gained ... arguments similar location. change allows us easily add new features pivoting functions without breaking existing CRAN packages user scripts. pivot_wider() provides temporary backwards compatible support case single unnamed argument previously positionally matched id_cols. one special case still works, throw warning encouraging explicitly name id_cols argument. read pattern, see https://design.tidyverse.org/dots--required.html (#1350).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"lifecycle-changes-1-3-0","dir":"Changelog","previous_headings":"","what":"Lifecycle changes","title":"tidyr 1.3.0","text":"functions deprecated tidyr 1.0 1.2 (old lazyeval functions ending _ various arguments unnest()) now warn every use. made defunct 2024 (#1406).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"rectangling-1-3-0","dir":"Changelog","previous_headings":"","what":"Rectangling","title":"tidyr 1.3.0","text":"unnest_longer() now consistently drops rows either NULL empty vectors (like integer()) default. Set new keep_empty argument TRUE retain . Previously, keep_empty = TRUE implicitly used NULL, keep_empty = FALSE used empty vectors, inconsistent tidyr verbs argument (#1363). unnest_longer() now uses \"\" index column fully unnamed vectors. also now consistently uses NA index column empty vectors “kept” keep_empty = TRUE (#1442). unnest_wider() now errors values unnested unnamed names_sep provided (#1367). unnest_wider() now generates automatic names partially unnamed vectors. Previously generated fully unnamed vectors, resulting strange mix automatic names name-repaired names (#1367).","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"general-1-3-0","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"General","title":"tidyr 1.3.0","text":"tidyr functions now consistently disallow renaming tidy-selection. Renaming never meaningful functions, previously either effect caused problems (#1449, #1104). tidyr errors (including input validation) thoroughly reviewed generally likely point right direction (#1313, #1400). uncount() now generic implementations can provided objects data frames (@mgirlich, #1358). uncount() gains ... argument. comes required optional arguments (@mgirlich, #1358). nest(), complete(), expand(), fill() now document support grouped data frames created dplyr::group_by() (#952). built datasets now standard tibbles (#1459). R >=3.4.0 now required, line tidyverse standard supporting previous 5 minor releases R. rlang >=1.0.4 vctrs >=0.5.2 now required (#1344, #1470). Removed dependency ellipsis favor equivalent functions rlang (#1314).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"nesting-packing-and-chopping-1-3-0","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Nesting, packing, and chopping","title":"tidyr 1.3.0","text":"unnest(), unchop(), unnest_longer(), unnest_wider() better handle lists additional classes (#1327). pack(), unpack(), chop(), unchop() gain error_call argument, turn improves error calls shown nest() various unnest() adjacent functions (#1446). chop(), unpack(), unchop() gain ..., must empty (#1447). unpack() better job reporting column name duplication issues gives better advice resolve using names_sep. also improves errors functions use unpack(), like unnest() unnest_wider() (#1425, #1367).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"pivoting-1-3-0","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Pivoting","title":"tidyr 1.3.0","text":"pivot_longer() longer supports interpreting values_ptypes = list() names_ptypes = list() NULL. empty list() now interpreted prototype apply columns, consistent 0-length value interpreted (#1296). pivot_longer(values_drop_na = TRUE) faster aren’t missing values drop (#1392, @mgirlich). pivot_longer() now memory efficient due usage vctrs::vec_interleave() (#1310, @mgirlich). pivot_longer() now throws slightly better error message values_ptypes names_ptypes provided coercion can’t made (#1364). pivot_wider() now throws better error message column selected names_from values_from also selected id_cols (#1318). pivot_wider() now faster names_sep provided (@mgirlich, #1426). pivot_longer_spec(), pivot_wider_spec(), build_longer_spec(), build_wider_spec() gain error_call argument, resulting better error reporting pivot_longer() pivot_wider() (#1408).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"missing-values-1-3-0","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Missing values","title":"tidyr 1.3.0","text":"fill() now works correctly column named .direction data (#1319, @tjmahr). replace_na() faster aren’t missing values replace (#1392, @mgirlich). documentation replace argument replace_na() now mentions replace always cast type data (#1317).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-121","dir":"Changelog","previous_headings":"","what":"tidyr 1.2.1","title":"tidyr 1.2.1","text":"CRAN release: 2022-09-08 Hot patch release resolve R CMD check failures.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-120","dir":"Changelog","previous_headings":"","what":"tidyr 1.2.0","title":"tidyr 1.2.0","text":"CRAN release: 2022-02-01","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"breaking-changes-1-2-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"tidyr 1.2.0","text":"complete() expand() longer allow complete expand grouping column. never well-defined since completion/expansion grouped data frame happens “within” group otherwise potential produce erroneous results (#1299). replace_na() longer allows type data change replacement applied. replace now always cast type data replacement made. example, means using replacement value 1.5 integer column longer allowed. Similarly, replacing missing values list-column must now done list(\"foo\") rather just \"foo\".","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"pivoting-1-2-0","dir":"Changelog","previous_headings":"","what":"Pivoting","title":"tidyr 1.2.0","text":"pivot_wider() gains new names_expand id_expand arguments turning implicit missing factor levels variable combinations explicit ones. similar drop argument spread() (#770). pivot_wider() gains new names_vary argument controlling ordering combining names_from values values_from column names (#839). pivot_wider() gains new unused_fn argument controlling summarize unused columns aren’t involved pivoting process (#990, thanks @mgirlich initial implementation). pivot_longer()’s names_transform values_transform arguments now accept single function applied columns (#1284, thanks @smingerson initial implementation). pivot_longer()’s names_ptypes values_ptypes arguments now accept single empty ptype applied columns (#1284).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"nesting-1-2-0","dir":"Changelog","previous_headings":"","what":"Nesting","title":"tidyr 1.2.0","text":"unnest() unchop()’s ptype argument now accepts single empty ptype applied cols (#1284). unpack() now silently skips non-data frame columns specified cols. matches existing behavior unchop() unnest() (#1153).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"rectangling-1-2-0","dir":"Changelog","previous_headings":"","what":"Rectangling","title":"tidyr 1.2.0","text":"unnest_wider() unnest_longer() can now unnest multiple columns (#740). unnest_longer()’s indices_to values_to arguments now accept glue specification, useful unnesting multiple columns. hoist(), unnest_longer(), unnest_wider(), ptype supplied, column can’t simplified, result list-column element type ptype (#998). unnest_wider() gains new strict argument controls whether strict vctrs typing rules applied. defaults FALSE backwards compatibility, often useful lax unnesting JSON, doesn’t always map one--one R’s types (#1125). hoist(), unnest_longer(), unnest_wider()’s simplify argument now accepts named list TRUE FALSE control simplification per column basis (#995). hoist(), unnest_longer(), unnest_wider()’s transform argument now accepts single function applied components (#1284). hoist(), unnest_longer(), unnest_wider()’s ptype argument now accepts single empty ptype applied components (#1284).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"grids-1-2-0","dir":"Changelog","previous_headings":"","what":"Grids","title":"tidyr 1.2.0","text":"complete() gains new explicit argument limiting fill implicit missing values. useful don’t want fill pre-existing missing values (#1270). complete() gains grouped data frame method. generates correct completed data frame groups involved (#396, #966).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"missing-values-1-2-0","dir":"Changelog","previous_headings":"","what":"Missing values","title":"tidyr 1.2.0","text":"drop_na(), replace_na(), fill() updated utilize vctrs. means can use functions wider variety column types, including lubridate’s Period types (#1094), data frame columns, rcrd type vctrs. replace_na() longer replaces empty atomic elements list-columns (like integer(0)). value replaced list-column NULL (#1168). drop_na() longer drops empty atomic elements list-columns (like integer(0)). value dropped list-column NULL (#1228).","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"general-1-2-0","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"General","title":"tidyr 1.2.0","text":"@mgirlich now tidyr author recognition significant sustained contributions. lazyeval variants tidyr verbs soft-deprecated. Expect move defunct stage next minor release tidyr (#1294). any_of() all_of() tidyselect now re-exported (#1217). dplyr >= 1.0.0 now required.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"pivoting-1-2-0-1","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Pivoting","title":"tidyr 1.2.0","text":"pivot_wider() now gives better advice identify duplicates values uniquely identified (#1113). pivot_wider() now throws informative error values_fn doesn’t result single summary value (#1238). pivot_wider() pivot_longer() now generate informative errors related name repair (#987). pivot_wider() now works correctly values_fill data frame. pivot_wider() longer accidentally retains values_from pivoting zero row data frame (#1249). pivot_wider() now correctly handles case id column name collides value names_from (#1107). pivot_wider() pivot_longer() now check spec columns .name .value character vectors. Additionally, .name column must unique (#1107). pivot_wider()’s names_from values_from arguments now required default values name value don’t correspond columns data. Additionally, must identify least 1 column data (#1240). pivot_wider()’s values_fn argument now correctly allows anonymous functions (#1114). pivot_wider_spec() now works correctly 0-row data frame spec doesn’t identify rows (#1250, #1252). pivot_longer()’s names_ptypes argument now applied names_transform consistency rectangling functions (.e. hoist()) (#1233). check_pivot_spec() new developer facing function validating pivot spec argument. useful extending pivot_longer() pivot_wider() new S3 methods (#1087).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"nesting-1-2-0-1","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Nesting","title":"tidyr 1.2.0","text":"nest() generic now avoids computing .data, making compatible lazy tibbles (#1134). .names_sep argument data.frame method nest() now actually used (#1174). unnest()’s ptype argument now works expected (#1158). unpack() longer drops empty columns specified cols (#1191). unpack() now works correctly data frame columns containing 1 row 0 columns (#1189). chop() now works correctly data frames 0 rows (#1206). chop()’s cols argument longer optional. matches behavior cols seen elsewhere tidyr (#1205). unchop() now respects ptype unnesting non-list column (#1211).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"rectangling-1-2-0-1","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Rectangling","title":"tidyr 1.2.0","text":"hoist() longer accidentally removes elements duplicated names (#1259).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"grids-1-2-0-1","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Grids","title":"tidyr 1.2.0","text":"grouped data frame methods complete() expand() now move group columns front result (addition columns completed expanded, already moved front). make intuitive sense, completing expanding “within” group, group columns first thing see (#1289). complete() now applies fill even columns complete specified (#1272). expand(), crossing(), nesting() now correctly retain NA values factors (#1275). expand_grid(), expand(), nesting(), crossing() now silently apply name repair automatically named inputs. avoids number issues resulting duplicate truncated names (#1116, #1221, #1092, #1037, #992). expand_grid(), expand(), nesting(), crossing() now allow columns unnamed data frames used expressions data frame specified, like expand_grid(tibble(x = 1), y = x). consistent tibble() behaves. expand_grid(), expand(), nesting(), crossing() now work correctly data frames containing 0 columns >0 rows (#1189). expand_grid(), expand(), nesting(), crossing() now return 1 row data frame inputs supplied, consistent prod() == 1L idea computations involving number combinations computed empty set return 1 (#1258).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"missing-values-1-2-0-1","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Missing values","title":"tidyr 1.2.0","text":"drop_na() longer drops missing values columns tidyselect expression results 0 columns selected used (#1227). fill() now treats NaN like missing value (#982).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-114","dir":"Changelog","previous_headings":"","what":"tidyr 1.1.4","title":"tidyr 1.1.4","text":"CRAN release: 2021-09-27 expand_grid() now twice fast pivot_wider() bit faster (@mgirlich, #1130). unchop() now much faster, propagates various functions, unnest(), unnest_longer(), unnest_wider(), separate_rows() (@mgirlich, @DavisVaughan, #1127). unnest() now much faster (@mgirlich, @DavisVaughan, #1127). unnest() longer allows unnesting list-col containing mix vector data frame elements. Previously, worked accident, considered -label usage unnest() now become error.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-113","dir":"Changelog","previous_headings":"","what":"tidyr 1.1.3","title":"tidyr 1.1.3","text":"CRAN release: 2021-03-03 tidyr verbs longer “default” methods lazyeval fallbacks. means ’ll get clearer error messages (#1036). uncount() error non-integer weights gives clearer error message negative weights (@mgirlich, #1069). can unnest dates (#1021, #1089). pivot_wider() works data.table empty key variables (@mgirlich, #1066). separate_rows() works factor columns (@mgirlich, #1058).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-112","dir":"Changelog","previous_headings":"","what":"tidyr 1.1.2","title":"tidyr 1.1.2","text":"CRAN release: 2020-08-27 separate_rows() returns 1.1.0 behaviour empty strings (@rjpatm, #1014).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-111","dir":"Changelog","previous_headings":"","what":"tidyr 1.1.1","title":"tidyr 1.1.1","text":"CRAN release: 2020-07-31 New tidyr logo! stringi dependency removed; substantial dependency make tidyr hard compile resource constrained environments (@rjpat, #936). Replace Rcpp cpp11. See https://cpp11.r-lib.org/articles/motivations.html reasons .","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-110","dir":"Changelog","previous_headings":"","what":"tidyr 1.1.0","title":"tidyr 1.1.0","text":"CRAN release: 2020-05-20","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"general-features-1-1-0","dir":"Changelog","previous_headings":"","what":"General features","title":"tidyr 1.1.0","text":"pivot_longer(), hoist(), unnest_wider(), unnest_longer() gain new transform arguments; allow transform values “flight”. partly needed vctrs coercion rules become stricter, give greater flexibility available previously (#921). Arguments use tidy selection syntax now clearly documented updated use tidyselect 1.1.0 (#872).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"pivoting-improvements-1-1-0","dir":"Changelog","previous_headings":"","what":"Pivoting improvements","title":"tidyr 1.1.0","text":"pivot_wider() pivot_longer() considerably performant, thanks largely improvements underlying vctrs code (#790, @DavisVaughan). pivot_longer() now supports names_to = character() prevents name column created (#961). {r} df <- tibble(id = 1:3, x_1 = 1:3, x_2 = 4:6) df %>% pivot_longer(-id, names_to = character()) pivot_longer() longer creates .copy variable presence duplicate column names. makes consistent handling non-unique specs. pivot_longer() automatically disambiguates non-unique ouputs, can occur input variables include additional component don’t care want discard (#792, #793). {r} df <- tibble(id = 1:3, x_1 = 1:3, x_2 = 4:6) df %>% pivot_longer(-id, names_pattern = \"(.)_.\") df %>% pivot_longer(-id, names_sep = \"_\", names_to = c(\"name\", NA)) df %>% pivot_longer(-id, names_sep = \"_\", names_to = c(\".value\", NA)) pivot_wider() gains names_sort argument allows sort column names order. default, FALSE, orders columns first appearance (#839). future version, ’ll consider changing default TRUE. pivot_wider() gains names_glue argument allows construct output column names glue specification. pivot_wider() arguments values_fn values_fill can now single values; now need use named list want use different values different value columns (#739, #746). also get improved errors ’re expected type.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"rectangling-1-1-0","dir":"Changelog","previous_headings":"","what":"Rectangling","title":"tidyr 1.1.0","text":"hoist() now automatically names pluckers single string (#837). error use duplicated column names (@mgirlich, #834), now uses rlang::list2() behind scenes (means can now use !!! :=) (#801). unnest_longer(), unnest_wider(), hoist() better job simplifying list-cols. longer add unneeded unspecified() result still list (#806), work list contains non-vectors (#810, #848). unnest_wider(names_sep = \"\") now provides default names unnamed inputs, suppressing many previous name repair messages (#742).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"nesting-1-1-0","dir":"Changelog","previous_headings":"","what":"Nesting","title":"tidyr 1.1.0","text":"pack() nest() gains .names_sep argument allows strip outer names inner names, symmetrical way argument unpack() unnest() combines inner outer names (#795, #797). unnest_wider() unnest_longer() can now unnest list_of columns. important unnesting columns created nest() pivot_wider(), create list_of columns id columns non-unique (#741).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"bug-fixes-and-minor-improvements-1-1-0","dir":"Changelog","previous_headings":"","what":"Bug fixes and minor improvements","title":"tidyr 1.1.0","text":"chop() now creates list-columns class vctrs::list_of(). helps keep track type case chopped data frame empty, allowing unchop() reconstitute data frame correct number types column even observations. drop_na() now preserves attributes unclassed vectors (#905). expand(), expand_grid(), crossing(), nesting() evaluate inputs iteratively, can refer freshly created columns, e.g. crossing(x = seq(-2, 2), y = x) (#820). expand(), expand_grid(), crossing(), nesting() gain .name_repair giving control name repair strategy (@jeffreypullin, #798). extract() lets use NA , documented (#793). extract(), separate(), hoist(), unnest_longer(), unnest_wider() give better error message col missing (#805). pack()’s first argument now .data instead data (#759). pivot_longer() now errors values_to length-1 character vector (#949). pivot_longer() pivot_wider() now generic implementations can provided objects data frames (#800). pivot_wider() can now pivot data frame columns (#926) unite(na.rm = TRUE) now works types variable, just character vectors (#765). unnest_wider() gives better error message attempt unnest multiple columns (#740). unnest_auto() works input data contains column called col (#959).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-102","dir":"Changelog","previous_headings":"","what":"tidyr 1.0.2","title":"tidyr 1.0.2","text":"CRAN release: 2020-01-24 Minor fixes dev versions rlang, tidyselect, tibble.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-101","dir":"Changelog","previous_headings":"","what":"tidyr 1.0.1","title":"tidyr 1.0.1","text":"exist since accidentally released v1.0.2","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-100","dir":"Changelog","previous_headings":"","what":"tidyr 1.0.0","title":"tidyr 1.0.0","text":"CRAN release: 2019-09-11","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"breaking-changes-1-0-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"tidyr 1.0.0","text":"See vignette(\"-packages\") detailed transition guide. nest() unnest() new syntax. majority existing usage automatically translated new syntax warning. doesn’t work, put script use old versions can take closer look update code: nest() now preserves grouping, implications downstream calls group-aware functions, dplyr::mutate() filter(). first argument nest() changed data .data. unnest() uses emerging tidyverse standard disambiguate unique names. Use names_repair = tidyr_legacy request previous approach. unnest_()/nest_() lazyeval methods unnest()/nest() now defunct. deprecated time, , since interface changed, package authors need update avoid deprecation warnings. think one clean break less work everyone. lazyeval functions formally deprecated, made defunct next major release. (See lifecycle vignette details deprecation stages). crossing() nesting() now return 0-row outputs input length-0 vector. want preserve previous behaviour silently dropped inputs, convert empty vectors NULL. (discussion general pattern https://github.com/tidyverse/principles/issues/24)","code":"library(tidyr) nest <- nest_legacy unnest <- unnest_legacy"},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"pivoting-1-0-0","dir":"Changelog","previous_headings":"","what":"Pivoting","title":"tidyr 1.0.0","text":"New pivot_longer() pivot_wider() provide modern alternatives spread() gather(). carefully redesigned easier learn remember, include many new features. Learn vignette(\"pivot\"). functions resolve multiple existing issues spread()/gather(). functions now handle mulitple value columns (#149/#150), support vector types (#333), use tidyverse conventions duplicated column names (#496, #478), symmetric (#453). pivot_longer() gracefully handles duplicated column names (#472), can directly split column names multiple variables. pivot_wider() can now aggregate (#474), select keys (#572), control generated column names (#208). demonstrate functions work practice, tidyr gained several new datasets: relig_income, construction, billboard, us_rent_income, fish_encounters world_bank_pop. Finally, tidyr demos removed. dated, superseded vignette(\"pivot\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"rectangling-1-0-0","dir":"Changelog","previous_headings":"","what":"Rectangling","title":"tidyr 1.0.0","text":"tidyr contains four new functions support rectangling, turning deeply nested list tidy tibble: unnest_longer(), unnest_wider(), unnest_auto(), hoist(). documented new vignette: vignette(\"rectangle\"). unnest_longer() unnest_wider() make easier unnest list-columns vectors either rows columns (#418). unnest_auto() automatically picks _longer() _wider() using heuristics based presence common names. New hoist() provides convenient way plucking components list-column top-level columns (#341). particularly useful working deeply nested JSON, provides convenient shortcut mutate() + map() pattern: {r} df %>% hoist(metadata, name = \"name\") # shortcut df %>% mutate(name = map_chr(metadata, \"name\"))","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"nesting-1-0-0","dir":"Changelog","previous_headings":"","what":"Nesting","title":"tidyr 1.0.0","text":"nest() unnest() updated new interfaces closely aligned evolving tidyverse conventions. use theory developed vctrs consistently handle mixtures input types, arguments overhauled based last years experience. supported new vignette(\"nest\"), outlines main ideas nested data (’s still rough, get better time). biggest change operation multiple columns: df %>% unnest(x, y, z) becomes df %>% unnest(c(x, y, z)) df %>% nest(x, y, z) becomes df %>% nest(data = c(x, y, z)). done best ensure common uses nest() unnest() continue work, generating informative warning telling precisely need update code. Please file issue ’ve missed important use case. unnest() overhauled: New keep_empty parameter ensures every row input gets least one row output, inserting missing values needed (#358). Provides names_sep argument control inner outer column names combined. Uses standard tidyverse name-repair rules, default get error output contain multiple columns name. can override using name_repair (#514). Now supports NULL entries (#436).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"packing-and-chopping-1-0-0","dir":"Changelog","previous_headings":"","what":"Packing and chopping","title":"tidyr 1.0.0","text":"hood, nest() unnest() implemented chop(), pack(), unchop(), unpack(): pack() unpack() allow pack unpack columns data frame columns (#523). chop() unchop() chop rows sets list-columns. Packing chopping interesting primarily atomic operations underlying nesting (similarly, unchop unpacking underlie unnesting), don’t expect used directly often.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"new-features-1-0-0","dir":"Changelog","previous_headings":"","what":"New features","title":"tidyr 1.0.0","text":"New expand_grid(), tidy version expand.grid(), lower-level existing expand() crossing() functions, takes individual vectors, sort uniquify . crossing(), nesting(), expand() rewritten use vctrs package. affect much existing code, considerably simplies implementation ensures functions work consistently across generalised vectors (#557). part alignment, functions now drop NULL inputs, 0-length vector.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"bug-fixes-and-minor-improvements-1-0-0","dir":"Changelog","previous_headings":"","what":"Bug fixes and minor improvements","title":"tidyr 1.0.0","text":"full_seq() now also works gaps observations shorter given period, within tolerance given tol. Previously, gaps consecutive observations range [period, period + tol]; gaps can now range [period - tol, period + tol] (@ha0ye, #657). tidyr now re-exports tibble(), as_tibble(), tribble(), well tidyselect helpers (starts_with(), ends_width(), …). makes generating documentation, reprexes, tests easier, makes tidyr easier use without also attaching dplyr. functions take ... instrumented functions ellipsis package warn ’ve supplied arguments ignored (typically ’ve misspelled argument name) (#573). complete() now uses full_join() levels preserved even levels specified (@Ryo-N7, #493). crossing() now takes unique values data frame inputs, just vector inputs (#490). gather() throws error column data frame (#553). extract() (hence pivot_longer()) can extract multiple input values single output column (#619). fill() now implemented using dplyr::mutate_at(). radically simplifies implementation considerably improves performance working grouped data (#520). fill() now accepts downup updown fill directions (@coolbutuseless, #505). unite() gains na.rm argument, making easier remove missing values prior uniting values together (#203)","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-083","dir":"Changelog","previous_headings":"","what":"tidyr 0.8.3","title":"tidyr 0.8.3","text":"CRAN release: 2019-03-01 crossing() preserves factor levels (#410), now works list-columns (#446, @SamanthaToet). (also help expand() built top crossing()) nest() compatible dplyr 0.8.0. spread() works id variable names (#525). unnest() preserves column unnested input zero-length (#483), using list_of() attribute correctly restore columns, possible. unnest() run named unnamed list-columns length (@hlendway, #460).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-082","dir":"Changelog","previous_headings":"","what":"tidyr 0.8.2","title":"tidyr 0.8.2","text":"CRAN release: 2018-10-28 separate() now accepts NA column name argument denote columns omitted result. (@markdly, #397). Minor updates ensure compatibility dependencies.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-081","dir":"Changelog","previous_headings":"","what":"tidyr 0.8.1","title":"tidyr 0.8.1","text":"CRAN release: 2018-05-18 unnest() weakens test “atomicity” restore previous behaviour unnesting factors dates (#407).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-080","dir":"Changelog","previous_headings":"","what":"tidyr 0.8.0","title":"tidyr 0.8.0","text":"CRAN release: 2018-01-29","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"breaking-changes-0-8-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"tidyr 0.8.0","text":"deliberate breaking changes release. However, number packages failing errors related numbers elements columns, row names. possible accidental API changes new bugs. see error package, sincerely appreciate minimal reprex. separate() now correctly uses -1 refer far right position, instead -2. depended behaviour, ’ll need switch packageVersion(\"tidyr\") > \"0.7.2\"","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"new-features-0-8-0","dir":"Changelog","previous_headings":"","what":"New features","title":"tidyr 0.8.0","text":"Increased test coverage 84% 99%. uncount() performs inverse operation dplyr::count() (#279)","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"bug-fixes-and-minor-improvements-0-8-0","dir":"Changelog","previous_headings":"","what":"Bug fixes and minor improvements","title":"tidyr 0.8.0","text":"complete(data) now returns data rather throwing error (#390). complete() zero-length completions returns original input (#331). crossing() preserves NAs (#364). expand() empty input gives empty data frame instead NULL (#331). expand(), crossing(), complete() now complete empty factors instead dropping (#270, #285) extract() better error message regex contain expected number groups (#313). drop_na() longer drops columns (@jennybryan, #245), works list-cols (#280). Equivalent NA list column empty (length 0) data structure. nest() now faster, especially long data frame collapsed nested data frame rows. nest() zero-row data frame works expected (#320). replace_na() longer complains try replace missing values variables present data (#356). replace_na() now also works vectors (#342, @flying-sheep), can replace NULL list-columns. throws better error message attempt replace something length 1. separate() longer checks ... empty, allowing methods make use . check added tidyr 0.4.0 (2016-02-02) deprecate previous behaviour ... passed strsplit(). separate() extract() now insert columns correct position drop = TRUE (#394). separate() now works correctly counts RHS using negative integer sep values (@markdly, #315). separate() gets improved warning message pieces aren’t expected (#375). separate_rows() supports list columns (#321), works empty tibbles. spread() now consistently returns 0 row outputs 0 row inputs (#269). spread() now works key column includes NA drop FALSE (#254). spread() longer returns tibbles row names (#322). spread(), separate(), extract() (#255), gather() (#347) now replace existing variables rather creating invalid data frame duplicated variable names (matching semantics mutate). unite() now works (documented) don’t supply variables (#355). unnest() gains preserve argument allows preserve list columns without unnesting (#328). unnest() can unnested list-columns contains lists lists (#278). unnest(df) now works df contains list-cols (#344)","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-072","dir":"Changelog","previous_headings":"","what":"tidyr 0.7.2","title":"tidyr 0.7.2","text":"CRAN release: 2017-10-16 SE variants gather_(), spread_() nest_() now treat non-syntactic names way pre tidy eval versions tidyr (#361). Fix tidyr bug revealed R-devel.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-071","dir":"Changelog","previous_headings":"","what":"tidyr 0.7.1","title":"tidyr 0.7.1","text":"CRAN release: 2017-09-01 hotfix release account tidyselect changes unit tests. Note upcoming version tidyselect backtracks changes announced 0.7.0. special evaluation semantics selection changed back old behaviour new rules causing much trouble confusion. now data expressions (symbols calls : c()) can refer registered variables objects context. However semantics context expressions (calls : c()) remain . expressions evaluated context refer registered variables. ’re writing functions refer contextual objects, still good idea avoid data expressions following advice 0.7.0 release notes.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-070","dir":"Changelog","previous_headings":"","what":"tidyr 0.7.0","title":"tidyr 0.7.0","text":"CRAN release: 2017-08-16 release includes important changes tidyr internals. Tidyr now supports new tidy evaluation framework quoting (NSE) functions. also uses new tidyselect package selecting backend.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"breaking-changes-0-7-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"tidyr 0.7.0","text":"see error messages objects functions found, likely selecting functions now stricter arguments example selecting function gather() ... argument. change makes code robust disallowing ambiguous scoping. Consider following code: select first three columns (using x defined global environment), select first two columns (using column named x)? solve ambiguity, now make strict distinction data context expressions. data expression either bare name expression like x:y c(x, y). data expression, can refer columns data frame. Everything else context expression can refer objects defined <-. practice means can longer refer contextual objects like : now explicit find objects. , can use quasiquotation operator !! evaluate argument early inline result: {r} mtcars %>% gather(var, value, !! 1:ncol(mtcars)) mtcars %>% gather(var, value, !! 1:x) mtcars %>% gather(var, value, !! -(1:x)) alternative turn data expression context expression using seq() seq_len() instead :. See section tidyselect information semantics. Following switch tidy evaluation, might see warnings “variable context set”. likely caused supplying helpers like everything() underscored versions tidyr verbs. Helpers always evaluated lazily. fix , just quote helper formula: drop_na(df, ~everything()). selecting functions now stricter supply integer positions. see error along lines please round positions supplying tidyr. Double vectors fine long rounded.","code":"x <- 3 df <- tibble(w = 1, x = 2, y = 3) gather(df, \"variable\", \"value\", 1:x) mtcars %>% gather(var, value, 1:ncol(mtcars)) x <- 3 mtcars %>% gather(var, value, 1:x) mtcars %>% gather(var, value, -(1:x)) `-0.949999999999999`, `-0.940000000000001`, ... must resolve to integer column positions, not a double vector"},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"switch-to-tidy-evaluation-0-7-0","dir":"Changelog","previous_headings":"","what":"Switch to tidy evaluation","title":"tidyr 0.7.0","text":"tidyr now tidy evaluation grammar. See programming vignette dplyr practical information tidy evaluation. tidyr port bit special. philosophy tidy evaluation R code refer real objects (data frame context), make exceptions rule tidyr. reason several functions accept bare symbols specify names new columns create (gather() prime example). tidy symbol represent actual object. workaround capture arguments using rlang::quo_name() (still support quasiquotation can unquote symbols strings). type NSE now discouraged tidyverse: symbols R code represent real objects. Following switch tidy eval underscored variants softly deprecated. However remain around time without warning backward compatibility.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"switch-to-the-tidyselect-backend-0-7-0","dir":"Changelog","previous_headings":"","what":"Switch to the tidyselect backend","title":"tidyr 0.7.0","text":"selecting backend dplyr extracted standalone package tidyselect tidyr now uses selecting variables. used selecting multiple variables (drop_na()) well single variables (col argument extract() separate(), key value arguments spread()). implies following changes: arguments selecting single variable now support features dplyr::pull(). can supply name position, including negative positions. Multiple variables now selected bit differently. now make strict distinction data context expressions. data expression either bare name expression like x:y c(x, y). data expression, can refer columns data frame. Everything else context expression can refer objects defined <-. can still refer contextual objects data expression explicit. One way explicit unquote variable environment tidy eval operator !!: hand, select helpers like start_with() context expressions. therefore easy refer objects never ambiguous data columns: {r} x <- \"d\" drop_na(df, starts_with(x)) special rules contrast dplyr tidyr verbs (data context scope) make sense selecting functions provide robust helpful semantics.","code":"x <- 2 drop_na(df, 2) # Works fine drop_na(df, x) # Object 'x' not found drop_na(df, !! x) # Works as if you had supplied 2"},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-063","dir":"Changelog","previous_headings":"","what":"tidyr 0.6.3","title":"tidyr 0.6.3","text":"CRAN release: 2017-05-15 Patch tests compatible dev tibble","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-062","dir":"Changelog","previous_headings":"","what":"tidyr 0.6.2","title":"tidyr 0.6.2","text":"CRAN release: 2017-05-04 Register C functions Added package docs Patch tests compatible dev dplyr.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-061","dir":"Changelog","previous_headings":"","what":"tidyr 0.6.1","title":"tidyr 0.6.1","text":"CRAN release: 2017-01-10 Patch test compatible dev tibble Changed deprecation message extract_numeric() point readr::parse_number() rather readr::parse_numeric()","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-060","dir":"Changelog","previous_headings":"","what":"tidyr 0.6.0","title":"tidyr 0.6.0","text":"CRAN release: 2016-08-12","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"api-changes-0-6-0","dir":"Changelog","previous_headings":"","what":"API changes","title":"tidyr 0.6.0","text":"drop_na() removes observations NA given variables. variables given, variables considered (#194, @janschulz). extract_numeric() deprecated (#213). Renamed table4 table5 table4a table4b make connection clear. key value variables table2 renamed type count.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"bug-fixes-and-minor-improvements-0-6-0","dir":"Changelog","previous_headings":"","what":"Bug fixes and minor improvements","title":"tidyr 0.6.0","text":"expand(), crossing(), nesting() now silently drop zero-length inputs. crossing_() nesting_() versions crossing() nesting() take list input. full_seq() works correctly dates date/times.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-051","dir":"Changelog","previous_headings":"","what":"tidyr 0.5.1","title":"tidyr 0.5.1","text":"CRAN release: 2016-06-14 Restored compatibility R < 3.3.0 avoiding getS3method(envir = ) (#205, @krlmlr).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-050","dir":"Changelog","previous_headings":"","what":"tidyr 0.5.0","title":"tidyr 0.5.0","text":"CRAN release: 2016-06-12","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"new-functions-0-5-0","dir":"Changelog","previous_headings":"","what":"New functions","title":"tidyr 0.5.0","text":"separate_rows() separates observations multiple delimited values separate rows (#69, @aaronwolen).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"bug-fixes-and-minor-improvements-0-5-0","dir":"Changelog","previous_headings":"","what":"Bug fixes and minor improvements","title":"tidyr 0.5.0","text":"complete() preserves grouping created dplyr (#168). expand() (hence complete()) preserves ordered attribute factors (#165). full_seq() preserve attributes dates date/times (#156), sequences longer need start 0. gather() can now gather together list columns (#175), gather_.data.frame(na.rm = TRUE) now removes missing values ’re actually present (#173). nest() returns correct output every variable nested (#186). separate() fills right--left (left--right!) fill = “left” (#170, @dgrtwo). separate() unite() now automatically drop removed variables grouping (#159, #177). spread() gains sep argument. -null, name columns “keyvalue”. Additionally, sep NULL missing values converted (#68). spread() works presence list-columns (#199) unnest() works non-syntactic names (#190). unnest() gains sep argument. non-null, rename columns nested data frames include original column name, nested column name, separated .sep (#184). unnest() gains .id argument works way bind_rows(). useful named list data frames vectors (#125). Moved useful sample datasets DSR package. Made compatible dplyr 0.4 0.5. tidyr functions create new columns aggressive re-encoding column names UTF-8.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-041","dir":"Changelog","previous_headings":"","what":"tidyr 0.4.1","title":"tidyr 0.4.1","text":"CRAN release: 2016-02-05 Fixed bug nest() nested data ending wrong row (#158).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-040","dir":"Changelog","previous_headings":"","what":"tidyr 0.4.0","title":"tidyr 0.4.0","text":"CRAN release: 2016-01-18","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"nested-data-frames-0-4-0","dir":"Changelog","previous_headings":"","what":"Nested data frames","title":"tidyr 0.4.0","text":"nest() unnest() overhauled support useful way structuring data frames: nested data frame. grouped data frame, one row per observation, additional metadata define groups. nested data frame, one row per group, individual observations stored column list data frames. useful structure lists objects (like models) one element per group. nest() now produces single list data frames called “data” rather list column variable. Nesting variables included nested data frames. also works grouped data frames made dplyr::group_by(). can override default column name .key. unnest() gains .drop argument controls happens list columns. default, ’re kept output doesn’t require row duplication; otherwise ’re dropped. unnest() now mutate() semantics ... - allows unnest transformed columns easily. (Previously used select semantics).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"expanding-0-4-0","dir":"Changelog","previous_headings":"","what":"Expanding","title":"tidyr 0.4.0","text":"expand() allows evaluate arbitrary expressions like full_seq(year). previously using c() created nested combinations, ’ll now need use nesting() (#85, #121). nesting() crossing() allow create nested crossed data frames individual vectors. crossing() similar base::expand.grid() full_seq(x, period) creates full sequence values min(x) max(x) every period values.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"minor-bug-fixes-and-improvements-0-4-0","dir":"Changelog","previous_headings":"","what":"Minor bug fixes and improvements","title":"tidyr 0.4.0","text":"fill() fills NULLs list-columns. fill() gains direction argument can fill either upwards downwards (#114). gather() now stores key column character, default. revert previous behaviour using factor (allows preserve ordering columns), use key_factor = TRUE (#96). tidyr verbs right thing grouped data frames created group_by() (#122, #129, #81). seq_range() removed. never used announced. spread() creates columns mixed type convert = TRUE (#118, @jennybc). spread() drop = FALSE handles zero-length factors (#56). spread()ing data frame key value columns creates one row output (#41). unite() now removes old columns adding new (#89, @krlmlr). separate() now warns defunct … argument used (#151, @krlmlr).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-031","dir":"Changelog","previous_headings":"","what":"tidyr 0.3.1","title":"tidyr 0.3.1","text":"CRAN release: 2015-09-10 Fixed bug attributes non-gather columns lost (#104)","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-030","dir":"Changelog","previous_headings":"","what":"tidyr 0.3.0","title":"tidyr 0.3.0","text":"CRAN release: 2015-09-08","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"new-features-0-3-0","dir":"Changelog","previous_headings":"","what":"New features","title":"tidyr 0.3.0","text":"New complete() provides wrapper around expand(), left_join() replace_na() common task: completing data frame missing combinations variables. fill() fills missing values column last non-missing value (#4). New replace_na() makes easy replace missing values something meaningful data. nest() complement unnest() (#3). unnest() can now work multiple list-columns time. don’t supply columns names, unlist list-columns (#44). unnest() can also handle columns lists data frames (#58).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"bug-fixes-and-minor-improvements-0-3-0","dir":"Changelog","previous_headings":"","what":"Bug fixes and minor improvements","title":"tidyr 0.3.0","text":"tidyr longer depends reshape2. fix issues also try load reshape (#88). %>% re-exported magrittr. expand() now supports nesting crossing (see examples details). comes expense creating new variables inline (#46). expand_ SE evaluation correctly can pass character vector columns names (list formulas etc) (#70). extract() 10x faster now uses stringi instead base R regular expressions. also returns NA instead throwing error regular expression doesn’t match (#72). extract() separate() preserve character vectors convert TRUE (#99). internals spread() rewritten, now preserve attributes input value column. means can now spread date (#62) factor (#35) inputs. spread() gives informative error message key value don’t exist input data (#36). separate() displays first 20 failures (#50). finer control happens two matches: can fill missing values either “left” “right” (#49). separate() longer throws error number pieces aren’t expected - instead uses drops extra values fills right gives warning. input NA separate() extract() return silently return NA outputs, rather throwing error. (#77) Experimental unnest() method lists removed.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-020","dir":"Changelog","previous_headings":"","what":"tidyr 0.2.0","title":"tidyr 0.2.0","text":"CRAN release: 2014-12-05","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"new-functions-0-2-0","dir":"Changelog","previous_headings":"","what":"New functions","title":"tidyr 0.2.0","text":"Experimental expand() function (#21). Experiment unnest() function converting named lists data frames. (#3, #22)","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"bug-fixes-and-minor-improvements-0-2-0","dir":"Changelog","previous_headings":"","what":"Bug fixes and minor improvements","title":"tidyr 0.2.0","text":"extract_numeric() preserves negative signs (#20). gather() better defaults key value supplied. ... omitted, gather() selects columns (#28). Performance now comparable reshape2::melt() (#18). separate() gains extra argument lets control happens extra pieces. default throw “error”, can also “merge” “drop”. spread() gains drop argument, allows preserve missing factor levels (#25). converts factor value variables character vectors, instead embedding matrix inside data frame (#35).","code":""}] +[{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"our-pledge","dir":"","previous_headings":"","what":"Our Pledge","title":"Contributor Covenant Code of Conduct","text":"members, contributors, leaders pledge make participation community harassment-free experience everyone, regardless age, body size, visible invisible disability, ethnicity, sex characteristics, gender identity expression, level experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, sexual identity orientation. pledge act interact ways contribute open, welcoming, diverse, inclusive, healthy community.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"our-standards","dir":"","previous_headings":"","what":"Our Standards","title":"Contributor Covenant Code of Conduct","text":"Examples behavior contributes positive environment community include: Demonstrating empathy kindness toward people respectful differing opinions, viewpoints, experiences Giving gracefully accepting constructive feedback Accepting responsibility apologizing affected mistakes, learning experience Focusing best just us individuals, overall community Examples unacceptable behavior include: use sexualized language imagery, sexual attention advances kind Trolling, insulting derogatory comments, personal political attacks Public private harassment Publishing others’ private information, physical email address, without explicit permission conduct reasonably considered inappropriate professional setting","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-responsibilities","dir":"","previous_headings":"","what":"Enforcement Responsibilities","title":"Contributor Covenant Code of Conduct","text":"Community leaders responsible clarifying enforcing standards acceptable behavior take appropriate fair corrective action response behavior deem inappropriate, threatening, offensive, harmful. Community leaders right responsibility remove, edit, reject comments, commits, code, wiki edits, issues, contributions aligned Code Conduct, communicate reasons moderation decisions appropriate.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"scope","dir":"","previous_headings":"","what":"Scope","title":"Contributor Covenant Code of Conduct","text":"Code Conduct applies within community spaces, also applies individual officially representing community public spaces. Examples representing community include using official e-mail address, posting via official social media account, acting appointed representative online offline event.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"enforcement","dir":"","previous_headings":"","what":"Enforcement","title":"Contributor Covenant Code of Conduct","text":"Instances abusive, harassing, otherwise unacceptable behavior may reported community leaders responsible enforcement codeofconduct@posit.co. complaints reviewed investigated promptly fairly. community leaders obligated respect privacy security reporter incident.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-guidelines","dir":"","previous_headings":"","what":"Enforcement Guidelines","title":"Contributor Covenant Code of Conduct","text":"Community leaders follow Community Impact Guidelines determining consequences action deem violation Code Conduct:","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_1-correction","dir":"","previous_headings":"Enforcement Guidelines","what":"1. Correction","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Use inappropriate language behavior deemed unprofessional unwelcome community. Consequence: private, written warning community leaders, providing clarity around nature violation explanation behavior inappropriate. public apology may requested.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_2-warning","dir":"","previous_headings":"Enforcement Guidelines","what":"2. Warning","title":"Contributor Covenant Code of Conduct","text":"Community Impact: violation single incident series actions. Consequence: warning consequences continued behavior. interaction people involved, including unsolicited interaction enforcing Code Conduct, specified period time. includes avoiding interactions community spaces well external channels like social media. Violating terms may lead temporary permanent ban.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_3-temporary-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"3. Temporary Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: serious violation community standards, including sustained inappropriate behavior. Consequence: temporary ban sort interaction public communication community specified period time. public private interaction people involved, including unsolicited interaction enforcing Code Conduct, allowed period. Violating terms may lead permanent ban.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"id_4-permanent-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"4. Permanent Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Demonstrating pattern violation community standards, including sustained inappropriate behavior, harassment individual, aggression toward disparagement classes individuals. Consequence: permanent ban sort public interaction within community.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CODE_OF_CONDUCT.html","id":"attribution","dir":"","previous_headings":"","what":"Attribution","title":"Contributor Covenant Code of Conduct","text":"Code Conduct adapted Contributor Covenant, version 2.1, available https://www.contributor-covenant.org/version/2/1/code_of_conduct.html. Community Impact Guidelines inspired [Mozilla’s code conduct enforcement ladder][https://github.com/mozilla/inclusion]. answers common questions code conduct, see FAQ https://www.contributor-covenant.org/faq. Translations available https://www.contributor-covenant.org/translations.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CONTRIBUTING.html","id":null,"dir":"","previous_headings":"","what":"Contributing to tidyr","title":"Contributing to tidyr","text":"outlines propose change tidyr. detailed info contributing , tidyverse packages, please see development contributing guide.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CONTRIBUTING.html","id":"fixing-typos","dir":"","previous_headings":"","what":"Fixing typos","title":"Contributing to tidyr","text":"Small typos grammatical errors documentation may edited directly using GitHub web interface, long changes made source file. YES: edit roxygen comment .R file R/. : edit .Rd file man/.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CONTRIBUTING.html","id":"prerequisites","dir":"","previous_headings":"","what":"Prerequisites","title":"Contributing to tidyr","text":"make substantial pull request, always file issue make sure someone team agrees ’s problem. ’ve found bug, create associated issue illustrate bug minimal reprex.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CONTRIBUTING.html","id":"pull-request-process","dir":"","previous_headings":"","what":"Pull request process","title":"Contributing to tidyr","text":"recommend create Git branch pull request (PR). Look Travis AppVeyor build status making changes. README contain badges continuous integration services used package. New code follow tidyverse style guide. can use styler package apply styles, please don’t restyle code nothing PR. use roxygen2, Markdown syntax, documentation. use testthat. Contributions test cases included easier accept. user-facing changes, add bullet top NEWS.md current development version header describing changes made followed GitHub username, links relevant issue(s)/PR(s).","code":""},{"path":"https://tidyr.tidyverse.org/dev/CONTRIBUTING.html","id":"code-of-conduct","dir":"","previous_headings":"","what":"Code of Conduct","title":"Contributing to tidyr","text":"Please note tidyr project released Contributor Code Conduct. contributing project agree abide terms.","code":""},{"path":"https://tidyr.tidyverse.org/dev/CONTRIBUTING.html","id":"see-tidyverse-development-contributing-guide","dir":"","previous_headings":"","what":"See tidyverse development contributing guide","title":"Contributing to tidyr","text":"details.","code":""},{"path":"https://tidyr.tidyverse.org/dev/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2023 tidyr authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://tidyr.tidyverse.org/dev/SUPPORT.html","id":null,"dir":"","previous_headings":"","what":"Getting help with tidyr","title":"Getting help with tidyr","text":"Thanks using tidyr. filing issue, places explore pieces put together make process smooth possible. Start making minimal reproducible example using reprex package. haven’t heard used reprex , ’re treat! Seriously, reprex make R-question-asking endeavors easier (pretty insane ROI five ten minutes ’ll take learn ’s ). additional reprex pointers, check Get help! section tidyverse site. Armed reprex, next step figure ask. ’s question: start forum.posit.co, /StackOverflow. people answer questions. ’s bug: ’re right place, file issue. ’re sure: let community help figure ! problem bug feature request, can easily return report . opening new issue, sure search issues pull requests make sure bug hasn’t reported /already fixed development version. default, search pre-populated :issue :open. can edit qualifiers (e.g. :pr, :closed) needed. example, ’d simply remove :open search issues repo, open closed. right place, need file issue, please review “File issues” paragraph tidyverse contributing guidelines. Thanks help!","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"In packages","text":"vignette serves two distinct, related, purposes: documents general best practices using tidyr package, inspired using ggplot2 packages. describes migration patterns transition tidyr v0.8.3 v1.0.0. release includes breaking changes nest() unnest() order increase consistency within tidyr rest tidyverse. go , ’ll attach packages use, expose version tidyr, make small dataset use examples.","code":"library(tidyr) library(dplyr, warn.conflicts = FALSE) library(purrr) packageVersion(\"tidyr\") #> [1] '1.3.1.9000' mini_iris <- as_tibble(iris)[c(1, 2, 51, 52, 101, 102), ] mini_iris #> # A tibble: 6 × 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 7 3.2 4.7 1.4 versicolor #> 4 6.4 3.2 4.5 1.5 versicolor #> 5 6.3 3.3 6 2.5 virginica #> 6 5.8 2.7 5.1 1.9 virginica"},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"using-tidyr-in-packages","dir":"Articles","previous_headings":"","what":"Using tidyr in packages","title":"In packages","text":"assume ’re already familiar using tidyr functions, described vignette(\"programming.Rmd\"). two important considerations using tidyr package: avoid R CMD CHECK notes using fixed variable names. alert upcoming changes development version tidyr.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"fixed-column-names","dir":"Articles","previous_headings":"Using tidyr in packages","what":"Fixed column names","title":"In packages","text":"know column names, code works way regardless whether inside outside package: R CMD check warn undefined global variables (Petal.Length, Petal.Width, Sepal.Length, Sepal.Width), doesn’t know nest() looking variables inside mini_iris (.e. Petal.Length friends data-variables, env-variables). easiest way silence note use all_of(). all_of() tidyselect helper (like starts_with(), ends_with(), etc.) takes column names stored strings: Alternatively, may want use any_of() OK specified variables found input data. tidyselect package offers entire family select helpers. probably already familiar using dplyr::select().","code":"mini_iris %>% nest( petal = c(Petal.Length, Petal.Width), sepal = c(Sepal.Length, Sepal.Width) ) #> # A tibble: 3 × 3 #> Species petal sepal #> #> 1 setosa #> 2 versicolor #> 3 virginica mini_iris %>% nest( petal = all_of(c(\"Petal.Length\", \"Petal.Width\")), sepal = all_of(c(\"Sepal.Length\", \"Sepal.Width\")) ) #> # A tibble: 3 × 3 #> Species petal sepal #> #> 1 setosa #> 2 versicolor #> 3 virginica "},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"continuous-integration","dir":"Articles","previous_headings":"Using tidyr in packages","what":"Continuous integration","title":"In packages","text":"Hopefully ’ve already adopted continuous integration package, R CMD check (includes tests) run regular basis, e.g. every time push changes package’s source GitHub similar. tidyverse team currently relies heavily GitHub Actions, example. usethis::use_github_action() can help get started. recommend adding workflow targets devel version tidyr. ? Always? package tightly coupled tidyr, consider leaving place time, know changes tidyr affect package. Right tidyr release? everyone else, add (re-activate existing) tidyr-devel workflow period preceding major tidyr release potential breaking changes, especially ’ve contacted reverse dependency checks. Example GitHub Actions workflow tests package development version tidyr: GitHub Actions evolving landscape, can always mine workflows tidyr (tidyverse/tidyr/.github/workflows) main r-lib/actions repo ideas.","code":"on: push: branches: - main pull_request: branches: - main name: R-CMD-check-tidyr-devel jobs: R-CMD-check: runs-on: macOS-latest steps: - uses: actions/checkout@v4 - uses: r-lib/actions/setup-r@v2 - name: Install dependencies run: | install.packages(c(\"remotes\", \"rcmdcheck\")) remotes::install_deps(dependencies = TRUE) remotes::install_github(\"tidyverse/tidyr\") shell: Rscript {0} - name: Check run: rcmdcheck::rcmdcheck(args = \"--no-manual\", error_on = \"error\") shell: Rscript {0}"},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"tidyr-v0-8-3---v1-0-0","dir":"Articles","previous_headings":"","what":"tidyr v0.8.3 -> v1.0.0","title":"In packages","text":"v1.0.0 makes considerable changes interface nest() unnest() order bring line newer tidyverse conventions. tried make functions backward compatible possible give informative warning messages, cover 100% use cases, may need change package code. guide help minimum pain. Ideally, ’ll tweak package works tidyr 0.8.3 tidyr 1.0.0. makes life considerably easier means ’s need coordinate CRAN submissions - can submit package works tidyr versions, submit tidyr CRAN. section describes recommend practices , drawing general principles described https://design.tidyverse.org/changes-multivers.html. use continuous integration already, strongly recommend adding build tests development version tidyr; see details. section briefly describes run different code different versions tidyr, goes major changes might require workarounds: nest() unnest() get new interfaces. nest() preserves groups. nest_() unnest_() defunct. ’re struggling problem ’s described , please reach via github email can help .","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"conditional-code","dir":"Articles","previous_headings":"tidyr v0.8.3 -> v1.0.0","what":"Conditional code","title":"In packages","text":"Sometimes ’ll able write code works v0.8.3 v1.0.0. often requires code ’s particularly natural either version ’d better (temporarily) separate code paths, containing non-contrived code. get re-use existing code “old” branch, eventually phased , write clean, forward-looking code “new” branch. basic approach looks like . First define function returns TRUE new versions tidyr: highly recommend keeping function provides obvious place jot transition notes package, makes easier remove transitional code later . Another benefit tidyr version determined run time, build time, therefore detect user’s current tidyr version. functions, use statement call different code different versions: new code uses function exists tidyr 1.0.0, get NOTE R CMD check: one notes can explain CRAN submission comments. Just mention ’s forward compatibility tidyr 1.0.0, CRAN let package .","code":"tidyr_new_interface <- function() { packageVersion(\"tidyr\") > \"0.8.99\" } my_function_inside_a_package <- function(...) # my code here if (tidyr_new_interface()) { # Freshly written code for v1.0.0 out <- tidyr::nest(df, data = any_of(c(\"x\", \"y\", \"z\"))) } else { # Existing code for v0.8.3 out <- tidyr::nest(df, x, y, z) } # more code here }"},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"new-syntax-for-nest","dir":"Articles","previous_headings":"tidyr v0.8.3 -> v1.0.0","what":"New syntax for nest()","title":"In packages","text":"changed: --nested columns longer accepted “loose parts”. new list-column’s name longer provided via .key argument. Now use construct like : new_col = . changed: use ... metadata problematic pattern ’re moving away . https://design.tidyverse.org/dots-data.html new_col = construct lets us create multiple nested list-columns (“multi-nest”). examples: need quick dirty fix without think, just call nest_legacy() instead nest(). ’s nest() v0.8.3:","code":"mini_iris %>% nest(petal = matches(\"Petal\"), sepal = matches(\"Sepal\")) #> # A tibble: 3 × 3 #> Species petal sepal #> #> 1 setosa #> 2 versicolor #> 3 virginica # v0.8.3 mini_iris %>% nest(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, .key = \"my_data\") # v1.0.0 mini_iris %>% nest(my_data = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)) # v1.0.0 avoiding R CMD check NOTE mini_iris %>% nest(my_data = any_of(c(\"Sepal.Length\", \"Sepal.Width\", \"Petal.Length\", \"Petal.Width\"))) # or equivalently: mini_iris %>% nest(my_data = !any_of(\"Species\")) if (tidyr_new_interface()) { out <- tidyr::nest_legacy(df, x, y, z) } else { out <- tidyr::nest(df, x, y, z) }"},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"new-syntax-for-unnest","dir":"Articles","previous_headings":"tidyr v0.8.3 -> v1.0.0","what":"New syntax for unnest()","title":"In packages","text":"changed: --unnested columns must now specified explicitly, instead defaulting list-columns. also deprecates .drop .preserve. .sep deprecated replaced names_sep. unnest() uses emerging tidyverse standard disambiguate duplicated names. Use names_repair = tidyr_legacy request previous approach. .id deprecated can easily replaced creating column names prior unnest(), e.g. upstream call mutate(). changed: use ... metadata problematic pattern ’re moving away . https://design.tidyverse.org/dots-data.html changes details arguments relate features rolling across multiple packages tidyverse. example, ptype exposes prototype support new vctrs package. names_repair specifies duplicated non-syntactic names, consistent tibble readxl. : need quick dirty fix without think, just call unnest_legacy() instead unnest(). ’s unnest() v0.8.3:","code":"# v0.8.3 df %>% unnest(x, .id = \"id\") # v1.0.0 df %>% mutate(id = names(x)) %>% unnest(x)) nested <- mini_iris %>% nest(my_data = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)) # v0.8.3 automatically unnests list-cols nested %>% unnest() # v1.0.0 must be told which columns to unnest nested %>% unnest(any_of(\"my_data\")) if (tidyr_new_interface()) { out <- tidyr::unnest_legacy(df) } else { out <- tidyr::unnest(df) }"},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"nest-preserves-groups","dir":"Articles","previous_headings":"tidyr v0.8.3 -> v1.0.0","what":"nest() preserves groups","title":"In packages","text":"changed: nest() now preserves groups present input. changed: reflect growing support grouped data frames, especially recent releases dplyr. See, example, dplyr::group_modify(), group_map(), friends. fact nest() now preserves groups problematic downstream, choices: Apply ungroup() result. level pragmatism suggests, however, least consider next two options. never grouped first place. Eliminate group_by() call specify columns nested versus nested directly nest(). Adjust downstream code accommodate grouping. Imagine used group_by() nest() mini_iris, computed list-column outside data frame. now try add back data post hoc: fails df grouped mutate() group-aware, ’s hard add completely external variable. pragmatically ungroup()ing, can ? One option work inside data frame, .e. bring map() inside mutate(), design problem away: , somehow, grouping seems appropriate working inside data frame option, tibble::add_column() group-unaware. lets add external data grouped data frame.","code":"(df <- mini_iris %>% group_by(Species) %>% nest()) #> # A tibble: 3 × 2 #> # Groups: Species [3] #> Species data #> #> 1 setosa #> 2 versicolor #> 3 virginica (external_variable <- map_int(df$data, nrow)) #> [1] 2 2 2 df %>% mutate(n_rows = external_variable) #> Error in `mutate()`: #> ℹ In argument: `n_rows = external_variable`. #> ℹ In group 1: `Species = setosa`. #> Caused by error: #> ! `n_rows` must be size 1, not 3. df %>% mutate(n_rows = map_int(data, nrow)) #> # A tibble: 3 × 3 #> # Groups: Species [3] #> Species data n_rows #> #> 1 setosa 2 #> 2 versicolor 2 #> 3 virginica 2 df %>% tibble::add_column(n_rows = external_variable) #> # A tibble: 3 × 3 #> # Groups: Species [3] #> Species data n_rows #> #> 1 setosa 2 #> 2 versicolor 2 #> 3 virginica 2"},{"path":"https://tidyr.tidyverse.org/dev/articles/in-packages.html","id":"nest_-and-unnest_-are-defunct","dir":"Articles","previous_headings":"tidyr v0.8.3 -> v1.0.0","what":"nest_() and unnest_() are defunct","title":"In packages","text":"changed: nest_() unnest_() longer work changed: Specialized standard evaluation versions functions, e.g., foo_() complement foo(). older lazyeval framework. :","code":"# v0.8.3 mini_iris %>% nest_( key_col = \"my_data\", nest_cols = c(\"Sepal.Length\", \"Sepal.Width\", \"Petal.Length\", \"Petal.Width\") ) nested %>% unnest_(~ my_data) # v1.0.0 mini_iris %>% nest(my_data = any_of(c(\"Sepal.Length\", \"Sepal.Width\", \"Petal.Length\", \"Petal.Width\"))) nested %>% unnest(any_of(\"my_data\"))"},{"path":"https://tidyr.tidyverse.org/dev/articles/nest.html","id":"basics","dir":"Articles","previous_headings":"","what":"Basics","title":"Nested data","text":"nested data frame data frame one () columns list data frames. can create simple nested data frames hand: (possible create list-columns regular data frames, just tibbles, ’s considerably work default behaviour data.frame() treat lists lists columns.) commonly ’ll create tidyr::nest(): nest() specifies variables nested inside; alternative use dplyr::group_by() describe variables kept outside. think nesting easiest understand connection grouped data: row output corresponds one group input. ’ll see shortly particularly convenient per-group objects. opposite nest() unnest(). give name list-column containing data frames, row-binds data frames together, repeating outer columns right number times line .","code":"df1 <- tibble( g = c(1, 2, 3), data = list( tibble(x = 1, y = 2), tibble(x = 4:5, y = 6:7), tibble(x = 10) ) ) df1 #> # A tibble: 3 × 2 #> g data #> #> 1 1 #> 2 2 #> 3 3 df2 <- tribble( ~g, ~x, ~y, 1, 1, 2, 2, 4, 6, 2, 5, 7, 3, 10, NA ) df2 %>% nest(data = c(x, y)) #> # A tibble: 3 × 2 #> g data #> #> 1 1 #> 2 2 #> 3 3 df2 %>% group_by(g) %>% nest() #> # A tibble: 3 × 2 #> # Groups: g [3] #> g data #> #> 1 1 #> 2 2 #> 3 3 df1 %>% unnest(data) #> # A tibble: 4 × 3 #> g x y #> #> 1 1 1 2 #> 2 2 4 6 #> 3 2 5 7 #> 4 3 10 NA"},{"path":"https://tidyr.tidyverse.org/dev/articles/nest.html","id":"nested-data-and-models","dir":"Articles","previous_headings":"","what":"Nested data and models","title":"Nested data","text":"Nested data great fit problems one something group. common place arises ’re fitting multiple models. list data frames, ’s natural produce list models: even produce list predictions: workflow works particularly well conjunction broom, makes easy turn models tidy data frames can unnest()ed get back flat data frames. can see bigger example broom dplyr vignette.","code":"mtcars_nested <- mtcars %>% group_by(cyl) %>% nest() mtcars_nested #> # A tibble: 3 × 2 #> # Groups: cyl [3] #> cyl data #> #> 1 6 #> 2 4 #> 3 8 mtcars_nested <- mtcars_nested %>% mutate(model = map(data, function(df) lm(mpg ~ wt, data = df))) mtcars_nested #> # A tibble: 3 × 3 #> # Groups: cyl [3] #> cyl data model #> #> 1 6 #> 2 4 #> 3 8 mtcars_nested <- mtcars_nested %>% mutate(model = map(model, predict)) mtcars_nested #> # A tibble: 3 × 3 #> # Groups: cyl [3] #> cyl data model #> #> 1 6 #> 2 4 #> 3 8 "},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Pivoting","text":"vignette describes use new pivot_longer() pivot_wider() functions. goal improve usability gather() spread(), incorporate state---art features found packages. time, ’s obvious something fundamentally wrong design spread() gather(). Many people don’t find names intuitive find hard remember direction corresponds spreading gathering. also seems surprisingly hard remember arguments functions, meaning many people (including !) consult documentation every time. two important new features inspired R packages advancing reshaping R: pivot_longer() can work multiple value variables may different types, inspired enhanced melt() dcast() functions provided data.table package Matt Dowle Arun Srinivasan. pivot_longer() pivot_wider() can take data frame specifies precisely metadata stored column names becomes data variables (vice versa), inspired cdata package John Mount Nina Zumel. vignette, ’ll learn key ideas behind pivot_longer() pivot_wider() see used solve variety data reshaping challenges ranging simple complex. begin ’ll load needed packages. real analysis code, ’d imagine ’d library(tidyverse), can’t since vignette embedded package.","code":"library(tidyr) library(dplyr) library(readr)"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"longer","dir":"Articles","previous_headings":"","what":"Longer","title":"Pivoting","text":"pivot_longer() makes datasets longer increasing number rows decreasing number columns. don’t believe makes sense describe dataset “long form”. Length relative term, can say (e.g.) dataset longer dataset B. pivot_longer() commonly needed tidy wild-caught datasets often optimise ease data entry ease comparison rather ease analysis. following sections show use pivot_longer() wide range realistic datasets.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"pew","dir":"Articles","previous_headings":"Longer","what":"String data in column names","title":"Pivoting","text":"relig_income dataset stores counts based survey (among things) asked people religion annual income: dataset contains three variables: religion, stored rows, income spread across column names, count stored cell values. tidy use pivot_longer(): first argument dataset reshape, relig_income. cols describes columns need reshaped. case, ’s every column apart religion. names_to gives name variable created data stored column names, .e. income. values_to gives name variable created data stored cell value, .e. count. Neither names_to values_to column exists relig_income, provide strings surrounded quotes.","code":"relig_income #> # A tibble: 18 × 11 #> religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` #> #> 1 Agnostic 27 34 60 81 76 137 #> 2 Atheist 12 27 37 52 35 70 #> 3 Buddhist 27 21 30 34 33 58 #> 4 Catholic 418 617 732 670 638 1116 #> 5 Don’t know/r… 15 14 15 11 10 35 #> 6 Evangelical … 575 869 1064 982 881 1486 #> 7 Hindu 1 9 7 9 11 34 #> 8 Historically… 228 244 236 238 197 223 #> 9 Jehovah's Wi… 20 27 24 24 21 30 #> 10 Jewish 19 19 25 25 30 95 #> # ℹ 8 more rows #> # ℹ 4 more variables: `$75-100k` , `$100-150k` , `>150k` , #> # `Don't know/refused` relig_income %>% pivot_longer( cols = !religion, names_to = \"income\", values_to = \"count\" ) #> # A tibble: 180 × 3 #> religion income count #> #> 1 Agnostic <$10k 27 #> 2 Agnostic $10-20k 34 #> 3 Agnostic $20-30k 60 #> 4 Agnostic $30-40k 81 #> 5 Agnostic $40-50k 76 #> 6 Agnostic $50-75k 137 #> 7 Agnostic $75-100k 122 #> 8 Agnostic $100-150k 109 #> 9 Agnostic >150k 84 #> 10 Agnostic Don't know/refused 96 #> # ℹ 170 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"billboard","dir":"Articles","previous_headings":"Longer","what":"Numeric data in column names","title":"Pivoting","text":"billboard dataset records billboard rank songs year 2000. form similar relig_income data, data encoded column names really number, string. can start basic specification relig_income dataset. want names become variable called week, values become variable called rank. also use values_drop_na drop rows correspond missing values. every song stays charts 76 weeks, structure input data force creation unnecessary explicit NAs. nice easily determine long song stayed charts, , ’ll need convert week variable integer. can using two additional arguments: names_prefix strips wk prefix, names_transform converts week integer: Alternatively, single argument using readr::parse_number() automatically strips non-numeric components:","code":"billboard #> # A tibble: 317 × 79 #> artist track date.entered wk1 wk2 wk3 wk4 wk5 wk6 wk7 #> #> 1 2 Pac Baby… 2000-02-26 87 82 72 77 87 94 99 #> 2 2Ge+her The … 2000-09-02 91 87 92 NA NA NA NA #> 3 3 Doors D… Kryp… 2000-04-08 81 70 68 67 66 57 54 #> 4 3 Doors D… Loser 2000-10-21 76 76 72 69 67 65 55 #> 5 504 Boyz Wobb… 2000-04-15 57 34 25 17 17 31 36 #> 6 98^0 Give… 2000-08-19 51 39 34 26 26 19 2 #> 7 A*Teens Danc… 2000-07-08 97 97 96 95 100 NA NA #> 8 Aaliyah I Do… 2000-01-29 84 62 51 41 38 35 35 #> 9 Aaliyah Try … 2000-03-18 59 53 38 28 21 18 16 #> 10 Adams, Yo… Open… 2000-08-26 76 76 74 69 68 67 61 #> # ℹ 307 more rows #> # ℹ 69 more variables: wk8 , wk9 , wk10 , wk11 , #> # wk12 , wk13 , wk14 , wk15 , wk16 , #> # wk17 , wk18 , wk19 , wk20 , wk21 , #> # wk22 , wk23 , wk24 , wk25 , wk26 , #> # wk27 , wk28 , wk29 , wk30 , wk31 , #> # wk32 , wk33 , wk34 , wk35 , wk36 , … billboard %>% pivot_longer( cols = starts_with(\"wk\"), names_to = \"week\", values_to = \"rank\", values_drop_na = TRUE ) #> # A tibble: 5,307 × 5 #> artist track date.entered week rank #> #> 1 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk1 87 #> 2 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk2 82 #> 3 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk3 72 #> 4 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk4 77 #> 5 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk5 87 #> 6 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk6 94 #> 7 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk7 99 #> 8 2Ge+her The Hardest Part Of ... 2000-09-02 wk1 91 #> 9 2Ge+her The Hardest Part Of ... 2000-09-02 wk2 87 #> 10 2Ge+her The Hardest Part Of ... 2000-09-02 wk3 92 #> # ℹ 5,297 more rows billboard %>% pivot_longer( cols = starts_with(\"wk\"), names_to = \"week\", names_prefix = \"wk\", names_transform = as.integer, values_to = \"rank\", values_drop_na = TRUE, ) billboard %>% pivot_longer( cols = starts_with(\"wk\"), names_to = \"week\", names_transform = readr::parse_number, values_to = \"rank\", values_drop_na = TRUE, )"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"many-variables-in-column-names","dir":"Articles","previous_headings":"Longer","what":"Many variables in column names","title":"Pivoting","text":"challenging situation occurs multiple variables crammed column names. example, take dataset: country, iso2, iso3, year already variables, can left . columns new_sp_m014 newrel_f65 encode four variables names: new_/new prefix indicates counts new cases. dataset contains new cases, ’ll ignore ’s constant. sp/rel/ep describe case diagnosed. m/f gives gender. 014/1524/2535/3544/4554/65 supplies age range. can break variables specifying multiple column names names_to, either providing names_sep names_pattern. names_pattern natural fit. similar interface extract: give regular expression containing groups (defined ()) puts group column. go one step use readr functions convert gender age factors. think good practice categorical variables known set values. way little efficient mutate fact, pivot_longer() transform one occurence name mutate() need transform many repetitions.","code":"who #> # A tibble: 7,240 × 60 #> country iso2 iso3 year new_sp_m014 new_sp_m1524 new_sp_m2534 #> #> 1 Afghanistan AF AFG 1980 NA NA NA #> 2 Afghanistan AF AFG 1981 NA NA NA #> 3 Afghanistan AF AFG 1982 NA NA NA #> 4 Afghanistan AF AFG 1983 NA NA NA #> 5 Afghanistan AF AFG 1984 NA NA NA #> 6 Afghanistan AF AFG 1985 NA NA NA #> 7 Afghanistan AF AFG 1986 NA NA NA #> 8 Afghanistan AF AFG 1987 NA NA NA #> 9 Afghanistan AF AFG 1988 NA NA NA #> 10 Afghanistan AF AFG 1989 NA NA NA #> # ℹ 7,230 more rows #> # ℹ 53 more variables: new_sp_m3544 , new_sp_m4554 , #> # new_sp_m5564 , new_sp_m65 , new_sp_f014 , #> # new_sp_f1524 , new_sp_f2534 , new_sp_f3544 , #> # new_sp_f4554 , new_sp_f5564 , new_sp_f65 , #> # new_sn_m014 , new_sn_m1524 , new_sn_m2534 , #> # new_sn_m3544 , new_sn_m4554 , new_sn_m5564 , … who %>% pivot_longer( cols = new_sp_m014:newrel_f65, names_to = c(\"diagnosis\", \"gender\", \"age\"), names_pattern = \"new_?(.*)_(.)(.*)\", values_to = \"count\" ) #> # A tibble: 405,440 × 8 #> country iso2 iso3 year diagnosis gender age count #> #> 1 Afghanistan AF AFG 1980 sp m 014 NA #> 2 Afghanistan AF AFG 1980 sp m 1524 NA #> 3 Afghanistan AF AFG 1980 sp m 2534 NA #> 4 Afghanistan AF AFG 1980 sp m 3544 NA #> 5 Afghanistan AF AFG 1980 sp m 4554 NA #> 6 Afghanistan AF AFG 1980 sp m 5564 NA #> 7 Afghanistan AF AFG 1980 sp m 65 NA #> 8 Afghanistan AF AFG 1980 sp f 014 NA #> 9 Afghanistan AF AFG 1980 sp f 1524 NA #> 10 Afghanistan AF AFG 1980 sp f 2534 NA #> # ℹ 405,430 more rows who %>% pivot_longer( cols = new_sp_m014:newrel_f65, names_to = c(\"diagnosis\", \"gender\", \"age\"), names_pattern = \"new_?(.*)_(.)(.*)\", names_transform = list( gender = ~ readr::parse_factor(.x, levels = c(\"f\", \"m\")), age = ~ readr::parse_factor( .x, levels = c(\"014\", \"1524\", \"2534\", \"3544\", \"4554\", \"5564\", \"65\"), ordered = TRUE ) ), values_to = \"count\", )"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"multiple-observations-per-row","dir":"Articles","previous_headings":"Longer","what":"Multiple observations per row","title":"Pivoting","text":"far, working data frames one observation per row, many important pivoting problems involve multiple observations per row. can usually recognise case name column want appear output part column name input. section, ’ll learn pivot sort data. following example adapted data.table vignette, inspiration tidyr’s solution problem. Note two pieces information (values) child: name dob (date birth). need go separate columns result. supply multiple variables names_to, using names_sep split variable name. Note special name .value: tells pivot_longer() part column name specifies “value” measured (become variable output). Note use values_drop_na = TRUE: input shape forces creation explicit missing variables observations don’t exist. similar problem problem also exists anscombe dataset built base R: dataset contains four pairs variables (x1 y1, x2 y2, etc) underlie Anscombe’s quartet, collection four datasets summary statistics (mean, sd, correlation etc), quite different data. want produce dataset columns set, x y. Setting cols_vary \"slowest\" groups values columns x1 y1 together rows output moving x2 y2. argument often produces intuitively ordered output pivoting every column dataset. similar situation can arise panel data. example, take example dataset provided Thomas Leeper. can tidy using approach anscombe:","code":"household #> # A tibble: 5 × 5 #> family dob_child1 dob_child2 name_child1 name_child2 #> #> 1 1 1998-11-26 2000-01-29 Susan Jose #> 2 2 1996-06-22 NA Mark NA #> 3 3 2002-07-11 2004-04-05 Sam Seth #> 4 4 2004-10-10 2009-08-27 Craig Khai #> 5 5 2000-12-05 2005-02-28 Parker Gracie household %>% pivot_longer( cols = !family, names_to = c(\".value\", \"child\"), names_sep = \"_\", values_drop_na = TRUE ) #> # A tibble: 9 × 4 #> family child dob name #> #> 1 1 child1 1998-11-26 Susan #> 2 1 child2 2000-01-29 Jose #> 3 2 child1 1996-06-22 Mark #> 4 3 child1 2002-07-11 Sam #> 5 3 child2 2004-04-05 Seth #> 6 4 child1 2004-10-10 Craig #> 7 4 child2 2009-08-27 Khai #> 8 5 child1 2000-12-05 Parker #> 9 5 child2 2005-02-28 Gracie anscombe #> x1 x2 x3 x4 y1 y2 y3 y4 #> 1 10 10 10 8 8.04 9.14 7.46 6.58 #> 2 8 8 8 8 6.95 8.14 6.77 5.76 #> 3 13 13 13 8 7.58 8.74 12.74 7.71 #> 4 9 9 9 8 8.81 8.77 7.11 8.84 #> 5 11 11 11 8 8.33 9.26 7.81 8.47 #> 6 14 14 14 8 9.96 8.10 8.84 7.04 #> 7 6 6 6 8 7.24 6.13 6.08 5.25 #> 8 4 4 4 19 4.26 3.10 5.39 12.50 #> 9 12 12 12 8 10.84 9.13 8.15 5.56 #> 10 7 7 7 8 4.82 7.26 6.42 7.91 #> 11 5 5 5 8 5.68 4.74 5.73 6.89 anscombe %>% pivot_longer( cols = everything(), cols_vary = \"slowest\", names_to = c(\".value\", \"set\"), names_pattern = \"(.)(.)\" ) #> # A tibble: 44 × 3 #> set x y #> #> 1 1 10 8.04 #> 2 1 8 6.95 #> 3 1 13 7.58 #> 4 1 9 8.81 #> 5 1 11 8.33 #> 6 1 14 9.96 #> 7 1 6 7.24 #> 8 1 4 4.26 #> 9 1 12 10.8 #> 10 1 7 4.82 #> # ℹ 34 more rows pnl <- tibble( x = 1:4, a = c(1, 1,0, 0), b = c(0, 1, 1, 1), y1 = rnorm(4), y2 = rnorm(4), z1 = rep(3, 4), z2 = rep(-2, 4), ) pnl %>% pivot_longer( cols = !c(x, a, b), names_to = c(\".value\", \"time\"), names_pattern = \"(.)(.)\" ) #> # A tibble: 8 × 6 #> x a b time y z #> #> 1 1 1 0 1 -1.40 3 #> 2 1 1 0 2 0.622 -2 #> 3 2 1 1 1 0.255 3 #> 4 2 1 1 2 1.15 -2 #> 5 3 0 1 1 -2.44 3 #> 6 3 0 1 2 -1.82 -2 #> 7 4 0 1 1 -0.00557 3 #> 8 4 0 1 2 -0.247 -2"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"wider","dir":"Articles","previous_headings":"","what":"Wider","title":"Pivoting","text":"pivot_wider() opposite pivot_longer(): makes dataset wider increasing number columns decreasing number rows. ’s relatively rare need pivot_wider() make tidy data, ’s often useful creating summary tables presentation, data format needed tools.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"capture-recapture-data","dir":"Articles","previous_headings":"Wider","what":"Capture-recapture data","title":"Pivoting","text":"fish_encounters dataset, contributed Myfanwy Johnston, describes fish swimming river detected automatic monitoring stations: Many tools used analyse data need form station column: dataset records fish detected station - doesn’t record wasn’t detected (common type data). means output data filled NAs. However, case know absence record means fish seen, can ask pivot_wider() fill missing values zeros:","code":"fish_encounters #> # A tibble: 114 × 3 #> fish station seen #> #> 1 4842 Release 1 #> 2 4842 I80_1 1 #> 3 4842 Lisbon 1 #> 4 4842 Rstr 1 #> 5 4842 Base_TD 1 #> 6 4842 BCE 1 #> 7 4842 BCW 1 #> 8 4842 BCE2 1 #> 9 4842 BCW2 1 #> 10 4842 MAE 1 #> # ℹ 104 more rows fish_encounters %>% pivot_wider( names_from = station, values_from = seen ) #> # A tibble: 19 × 12 #> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE #> #> 1 4842 1 1 1 1 1 1 1 1 1 1 #> 2 4843 1 1 1 1 1 1 1 1 1 1 #> 3 4844 1 1 1 1 1 1 1 1 1 1 #> 4 4845 1 1 1 1 1 NA NA NA NA NA #> 5 4847 1 1 1 NA NA NA NA NA NA NA #> 6 4848 1 1 1 1 NA NA NA NA NA NA #> 7 4849 1 1 NA NA NA NA NA NA NA NA #> 8 4850 1 1 NA 1 1 1 1 NA NA NA #> 9 4851 1 1 NA NA NA NA NA NA NA NA #> 10 4854 1 1 NA NA NA NA NA NA NA NA #> # ℹ 9 more rows #> # ℹ 1 more variable: MAW fish_encounters %>% pivot_wider( names_from = station, values_from = seen, values_fill = 0 ) #> # A tibble: 19 × 12 #> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE #> #> 1 4842 1 1 1 1 1 1 1 1 1 1 #> 2 4843 1 1 1 1 1 1 1 1 1 1 #> 3 4844 1 1 1 1 1 1 1 1 1 1 #> 4 4845 1 1 1 1 1 0 0 0 0 0 #> 5 4847 1 1 1 0 0 0 0 0 0 0 #> 6 4848 1 1 1 1 0 0 0 0 0 0 #> 7 4849 1 1 0 0 0 0 0 0 0 0 #> 8 4850 1 1 0 1 1 1 1 0 0 0 #> 9 4851 1 1 0 0 0 0 0 0 0 0 #> 10 4854 1 1 0 0 0 0 0 0 0 0 #> # ℹ 9 more rows #> # ℹ 1 more variable: MAW "},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"aggregation","dir":"Articles","previous_headings":"Wider","what":"Aggregation","title":"Pivoting","text":"can also use pivot_wider() perform simple aggregation. example, take warpbreaks dataset built base R (converted tibble better print method): designed experiment nine replicates every combination wool (B) tension (L, M, H): happens attempt pivot levels wool columns? get warning cell output corresponds multiple cells input. default behaviour produces list-columns, contain individual values. useful output summary statistics, e.g. mean breaks combination wool tension: complex summary operations, recommend summarising reshaping, simple cases ’s often convenient summarise within pivot_wider().","code":"warpbreaks <- warpbreaks %>% as_tibble() %>% select(wool, tension, breaks) warpbreaks #> # A tibble: 54 × 3 #> wool tension breaks #> #> 1 A L 26 #> 2 A L 30 #> 3 A L 54 #> 4 A L 25 #> 5 A L 70 #> 6 A L 52 #> 7 A L 51 #> 8 A L 26 #> 9 A L 67 #> 10 A M 18 #> # ℹ 44 more rows warpbreaks %>% count(wool, tension) #> # A tibble: 6 × 3 #> wool tension n #> #> 1 A L 9 #> 2 A M 9 #> 3 A H 9 #> 4 B L 9 #> 5 B M 9 #> 6 B H 9 warpbreaks %>% pivot_wider( names_from = wool, values_from = breaks ) #> Warning: Values from `breaks` are not uniquely identified; output will contain #> list-cols. #> • Use `values_fn = list` to suppress this warning. #> • Use `values_fn = {summary_fun}` to summarise duplicates. #> • Use the following dplyr code to identify duplicates. #> {data} |> #> dplyr::summarise(n = dplyr::n(), .by = c(tension, wool)) |> #> dplyr::filter(n > 1L) #> # A tibble: 3 × 3 #> tension A B #> #> 1 L #> 2 M #> 3 H warpbreaks %>% pivot_wider( names_from = wool, values_from = breaks, values_fn = mean ) #> # A tibble: 3 × 3 #> tension A B #> #> 1 L 44.6 28.2 #> 2 M 24 28.8 #> 3 H 24.6 18.8"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"generate-column-name-from-multiple-variables","dir":"Articles","previous_headings":"Wider","what":"Generate column name from multiple variables","title":"Pivoting","text":"Imagine, https://stackoverflow.com/questions/24929954, information containing combination product, country, year. tidy form might look like : want widen data one column combination product country. key specify multiple variables names_from: either names_from values_from select multiple variables, can control column names output constructed names_sep names_prefix, workhorse names_glue:","code":"production <- expand_grid( product = c(\"A\", \"B\"), country = c(\"AI\", \"EI\"), year = 2000:2014 ) %>% filter((product == \"A\" & country == \"AI\") | product == \"B\") %>% mutate(production = rnorm(nrow(.))) production #> # A tibble: 45 × 4 #> product country year production #> #> 1 A AI 2000 -0.244 #> 2 A AI 2001 -0.283 #> 3 A AI 2002 -0.554 #> 4 A AI 2003 0.629 #> 5 A AI 2004 2.07 #> 6 A AI 2005 -1.63 #> 7 A AI 2006 0.512 #> 8 A AI 2007 -1.86 #> 9 A AI 2008 -0.522 #> 10 A AI 2009 -0.0526 #> # ℹ 35 more rows production %>% pivot_wider( names_from = c(product, country), values_from = production ) #> # A tibble: 15 × 4 #> year A_AI B_AI B_EI #> #> 1 2000 -0.244 0.738 -0.313 #> 2 2001 -0.283 1.89 1.07 #> 3 2002 -0.554 -0.0974 0.0700 #> 4 2003 0.629 -0.936 -0.639 #> 5 2004 2.07 -0.0160 -0.0500 #> 6 2005 -1.63 -0.827 -0.251 #> 7 2006 0.512 -1.51 0.445 #> 8 2007 -1.86 0.935 2.76 #> 9 2008 -0.522 0.176 0.0465 #> 10 2009 -0.0526 0.244 0.578 #> # ℹ 5 more rows production %>% pivot_wider( names_from = c(product, country), values_from = production, names_sep = \".\", names_prefix = \"prod.\" ) #> # A tibble: 15 × 4 #> year prod.A.AI prod.B.AI prod.B.EI #> #> 1 2000 -0.244 0.738 -0.313 #> 2 2001 -0.283 1.89 1.07 #> 3 2002 -0.554 -0.0974 0.0700 #> 4 2003 0.629 -0.936 -0.639 #> 5 2004 2.07 -0.0160 -0.0500 #> 6 2005 -1.63 -0.827 -0.251 #> 7 2006 0.512 -1.51 0.445 #> 8 2007 -1.86 0.935 2.76 #> 9 2008 -0.522 0.176 0.0465 #> 10 2009 -0.0526 0.244 0.578 #> # ℹ 5 more rows production %>% pivot_wider( names_from = c(product, country), values_from = production, names_glue = \"prod_{product}_{country}\" ) #> # A tibble: 15 × 4 #> year prod_A_AI prod_B_AI prod_B_EI #> #> 1 2000 -0.244 0.738 -0.313 #> 2 2001 -0.283 1.89 1.07 #> 3 2002 -0.554 -0.0974 0.0700 #> 4 2003 0.629 -0.936 -0.639 #> 5 2004 2.07 -0.0160 -0.0500 #> 6 2005 -1.63 -0.827 -0.251 #> 7 2006 0.512 -1.51 0.445 #> 8 2007 -1.86 0.935 2.76 #> 9 2008 -0.522 0.176 0.0465 #> 10 2009 -0.0526 0.244 0.578 #> # ℹ 5 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"tidy-census","dir":"Articles","previous_headings":"Wider","what":"Tidy census","title":"Pivoting","text":"us_rent_income dataset contains information median income rent state US 2017 (American Community Survey, retrieved tidycensus package). estimate moe values columns, can supply values_from: Note name variable automatically appended output columns.","code":"us_rent_income #> # A tibble: 104 × 5 #> GEOID NAME variable estimate moe #> #> 1 01 Alabama income 24476 136 #> 2 01 Alabama rent 747 3 #> 3 02 Alaska income 32940 508 #> 4 02 Alaska rent 1200 13 #> 5 04 Arizona income 27517 148 #> 6 04 Arizona rent 972 4 #> 7 05 Arkansas income 23789 165 #> 8 05 Arkansas rent 709 5 #> 9 06 California income 29454 109 #> 10 06 California rent 1358 3 #> # ℹ 94 more rows us_rent_income %>% pivot_wider( names_from = variable, values_from = c(estimate, moe) ) #> # A tibble: 52 × 6 #> GEOID NAME estimate_income estimate_rent moe_income moe_rent #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Co… 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"implicit-missing-values","dir":"Articles","previous_headings":"Wider","what":"Implicit missing values","title":"Pivoting","text":"Occasionally, ’ll come across data names variable encoded factor, data represented. pivot_wider() defaults generating columns values actually represented data, might want include column possible level case data changes future. names_expand argument turn implicit factor levels explicit ones, forcing represented result. also sorts column names using level order, produces intuitive results case. multiple names_from columns provided, names_expand generate Cartesian product possible combinations names_from values. Notice following data omitted rows percentage value 0. names_expand allows us make explicit pivot. related problem can occur implicit missing factor levels combinations id_cols. case, missing rows (rather columns) ’d like explicitly represent. example, ’ll modify daily data type column, pivot instead, keeping day id column. type levels represented columns, missing rows related unrepresented day factor levels. can use id_expand way used names_expand, expand (sort) implicit missing rows id_cols.","code":"weekdays <- c(\"Mon\", \"Tue\", \"Wed\", \"Thu\", \"Fri\", \"Sat\", \"Sun\") daily <- tibble( day = factor(c(\"Tue\", \"Thu\", \"Fri\", \"Mon\"), levels = weekdays), value = c(2, 3, 1, 5) ) daily #> # A tibble: 4 × 2 #> day value #> #> 1 Tue 2 #> 2 Thu 3 #> 3 Fri 1 #> 4 Mon 5 daily %>% pivot_wider( names_from = day, values_from = value ) #> # A tibble: 1 × 4 #> Tue Thu Fri Mon #> #> 1 2 3 1 5 daily %>% pivot_wider( names_from = day, values_from = value, names_expand = TRUE ) #> # A tibble: 1 × 7 #> Mon Tue Wed Thu Fri Sat Sun #> #> 1 5 2 NA 3 1 NA NA percentages <- tibble( year = c(2018, 2019, 2020, 2020), type = factor(c(\"A\", \"B\", \"A\", \"B\"), levels = c(\"A\", \"B\")), percentage = c(100, 100, 40, 60) ) percentages #> # A tibble: 4 × 3 #> year type percentage #> #> 1 2018 A 100 #> 2 2019 B 100 #> 3 2020 A 40 #> 4 2020 B 60 percentages %>% pivot_wider( names_from = c(year, type), values_from = percentage, names_expand = TRUE, values_fill = 0 ) #> # A tibble: 1 × 6 #> `2018_A` `2018_B` `2019_A` `2019_B` `2020_A` `2020_B` #> #> 1 100 0 0 100 40 60 daily <- mutate(daily, type = factor(c(\"A\", \"B\", \"B\", \"A\"))) daily #> # A tibble: 4 × 3 #> day value type #> #> 1 Tue 2 A #> 2 Thu 3 B #> 3 Fri 1 B #> 4 Mon 5 A daily %>% pivot_wider( names_from = type, values_from = value, values_fill = 0 ) #> # A tibble: 4 × 3 #> day A B #> #> 1 Tue 2 0 #> 2 Thu 0 3 #> 3 Fri 0 1 #> 4 Mon 5 0 daily %>% pivot_wider( names_from = type, values_from = value, values_fill = 0, id_expand = TRUE ) #> # A tibble: 7 × 3 #> day A B #> #> 1 Mon 5 0 #> 2 Tue 2 0 #> 3 Wed 0 0 #> 4 Thu 0 3 #> 5 Fri 0 1 #> 6 Sat 0 0 #> 7 Sun 0 0"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"unused-columns","dir":"Articles","previous_headings":"Wider","what":"Unused columns","title":"Pivoting","text":"Imagine ’ve found situation columns data completely unrelated pivoting process, ’d still like retain information somehow. example, updates ’d like pivot system column create one row summaries county’s system updates. typical pivot_wider() call, completely lose information date column. example, ’d like retain recent update date across systems particular county. accomplish can use unused_fn argument, allows us summarize values columns utilized pivoting process. can also retain data delay aggregation entirely using list() summary function.","code":"updates <- tibble( county = c(\"Wake\", \"Wake\", \"Wake\", \"Guilford\", \"Guilford\"), date = c(as.Date(\"2020-01-01\") + 0:2, as.Date(\"2020-01-03\") + 0:1), system = c(\"A\", \"B\", \"C\", \"A\", \"C\"), value = c(3.2, 4, 5.5, 2, 1.2) ) updates #> # A tibble: 5 × 4 #> county date system value #> #> 1 Wake 2020-01-01 A 3.2 #> 2 Wake 2020-01-02 B 4 #> 3 Wake 2020-01-03 C 5.5 #> 4 Guilford 2020-01-03 A 2 #> 5 Guilford 2020-01-04 C 1.2 updates %>% pivot_wider( id_cols = county, names_from = system, values_from = value ) #> # A tibble: 2 × 4 #> county A B C #> #> 1 Wake 3.2 4 5.5 #> 2 Guilford 2 NA 1.2 updates %>% pivot_wider( id_cols = county, names_from = system, values_from = value, unused_fn = list(date = max) ) #> # A tibble: 2 × 5 #> county A B C date #> #> 1 Wake 3.2 4 5.5 2020-01-03 #> 2 Guilford 2 NA 1.2 2020-01-04 updates %>% pivot_wider( id_cols = county, names_from = system, values_from = value, unused_fn = list(date = list) ) #> # A tibble: 2 × 5 #> county A B C date #> #> 1 Wake 3.2 4 5.5 #> 2 Guilford 2 NA 1.2 "},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"contact-list","dir":"Articles","previous_headings":"Wider","what":"Contact list","title":"Pivoting","text":"final challenge inspired Jiena Gu. Imagine contact list ’ve copied pasted website: challenging ’s variable identifies observations belong together. can fix noting every contact starts name, can create unique id counting every time see “name” field: Now unique identifier person, can pivot field value columns:","code":"contacts <- tribble( ~field, ~value, \"name\", \"Jiena McLellan\", \"company\", \"Toyota\", \"name\", \"John Smith\", \"company\", \"google\", \"email\", \"john@google.com\", \"name\", \"Huxley Ratcliffe\" ) contacts <- contacts %>% mutate( person_id = cumsum(field == \"name\") ) contacts #> # A tibble: 6 × 3 #> field value person_id #> #> 1 name Jiena McLellan 1 #> 2 company Toyota 1 #> 3 name John Smith 2 #> 4 company google 2 #> 5 email john@google.com 2 #> 6 name Huxley Ratcliffe 3 contacts %>% pivot_wider( names_from = field, values_from = value ) #> # A tibble: 3 × 4 #> person_id name company email #> #> 1 1 Jiena McLellan Toyota NA #> 2 2 John Smith google john@google.com #> 3 3 Huxley Ratcliffe NA NA"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"longer-then-wider","dir":"Articles","previous_headings":"","what":"Longer, then wider","title":"Pivoting","text":"problems can’t solved pivoting single direction. examples section show might combine pivot_longer() pivot_wider() solve complex problems.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"world-bank","dir":"Articles","previous_headings":"Longer, then wider","what":"World bank","title":"Pivoting","text":"world_bank_pop contains data World Bank population per country 2000 2018. goal produce tidy dataset variable column. ’s obvious exactly steps needed yet, ’ll start obvious problem: year spread across multiple columns. Next need consider indicator variable: SP.POP.GROW population growth, SP.POP.TOTL total population, SP.URB.* urban areas. Let’s split two variables: area (total urban) actual variable (population growth): Now can complete tidying pivoting variable value make TOTL GROW columns:","code":"world_bank_pop #> # A tibble: 1,064 × 20 #> country indicator `2000` `2001` `2002` `2003` `2004` `2005` #> #> 1 ABW SP.URB.TOTL 4.16e4 4.20e+4 4.22e+4 4.23e+4 4.23e+4 4.24e+4 #> 2 ABW SP.URB.GROW 1.66e0 9.56e-1 4.01e-1 1.97e-1 9.46e-2 1.94e-1 #> 3 ABW SP.POP.TOTL 8.91e4 9.07e+4 9.18e+4 9.27e+4 9.35e+4 9.45e+4 #> 4 ABW SP.POP.GROW 2.54e0 1.77e+0 1.19e+0 9.97e-1 9.01e-1 1.00e+0 #> 5 AFE SP.URB.TOTL 1.16e8 1.20e+8 1.24e+8 1.29e+8 1.34e+8 1.39e+8 #> 6 AFE SP.URB.GROW 3.60e0 3.66e+0 3.72e+0 3.71e+0 3.74e+0 3.81e+0 #> 7 AFE SP.POP.TOTL 4.02e8 4.12e+8 4.23e+8 4.34e+8 4.45e+8 4.57e+8 #> 8 AFE SP.POP.GROW 2.58e0 2.59e+0 2.61e+0 2.62e+0 2.64e+0 2.67e+0 #> 9 AFG SP.URB.TOTL 4.31e6 4.36e+6 4.67e+6 5.06e+6 5.30e+6 5.54e+6 #> 10 AFG SP.URB.GROW 1.86e0 1.15e+0 6.86e+0 7.95e+0 4.59e+0 4.47e+0 #> # ℹ 1,054 more rows #> # ℹ 12 more variables: `2006` , `2007` , `2008` , #> # `2009` , `2010` , `2011` , `2012` , `2013` , #> # `2014` , `2015` , `2016` , `2017` pop2 <- world_bank_pop %>% pivot_longer( cols = `2000`:`2017`, names_to = \"year\", values_to = \"value\" ) pop2 #> # A tibble: 19,152 × 4 #> country indicator year value #> #> 1 ABW SP.URB.TOTL 2000 41625 #> 2 ABW SP.URB.TOTL 2001 42025 #> 3 ABW SP.URB.TOTL 2002 42194 #> 4 ABW SP.URB.TOTL 2003 42277 #> 5 ABW SP.URB.TOTL 2004 42317 #> 6 ABW SP.URB.TOTL 2005 42399 #> 7 ABW SP.URB.TOTL 2006 42555 #> 8 ABW SP.URB.TOTL 2007 42729 #> 9 ABW SP.URB.TOTL 2008 42906 #> 10 ABW SP.URB.TOTL 2009 43079 #> # ℹ 19,142 more rows pop2 %>% count(indicator) #> # A tibble: 4 × 2 #> indicator n #> #> 1 SP.POP.GROW 4788 #> 2 SP.POP.TOTL 4788 #> 3 SP.URB.GROW 4788 #> 4 SP.URB.TOTL 4788 pop3 <- pop2 %>% separate(indicator, c(NA, \"area\", \"variable\")) pop3 #> # A tibble: 19,152 × 5 #> country area variable year value #> #> 1 ABW URB TOTL 2000 41625 #> 2 ABW URB TOTL 2001 42025 #> 3 ABW URB TOTL 2002 42194 #> 4 ABW URB TOTL 2003 42277 #> 5 ABW URB TOTL 2004 42317 #> 6 ABW URB TOTL 2005 42399 #> 7 ABW URB TOTL 2006 42555 #> 8 ABW URB TOTL 2007 42729 #> 9 ABW URB TOTL 2008 42906 #> 10 ABW URB TOTL 2009 43079 #> # ℹ 19,142 more rows pop3 %>% pivot_wider( names_from = variable, values_from = value ) #> # A tibble: 9,576 × 5 #> country area year TOTL GROW #> #> 1 ABW URB 2000 41625 1.66 #> 2 ABW URB 2001 42025 0.956 #> 3 ABW URB 2002 42194 0.401 #> 4 ABW URB 2003 42277 0.197 #> 5 ABW URB 2004 42317 0.0946 #> 6 ABW URB 2005 42399 0.194 #> 7 ABW URB 2006 42555 0.367 #> 8 ABW URB 2007 42729 0.408 #> 9 ABW URB 2008 42906 0.413 #> 10 ABW URB 2009 43079 0.402 #> # ℹ 9,566 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"multi-choice","dir":"Articles","previous_headings":"Longer, then wider","what":"Multi-choice","title":"Pivoting","text":"Based suggestion Maxime Wack, https://github.com/tidyverse/tidyr/issues/384), final example shows deal common way recording multiple choice data. Often get data follows: actual order isn’t important, ’d prefer individual questions columns. can achieve desired transformation two steps. First, make data longer, eliminating explicit NAs, adding column indicate choice chosen: make data wider, filling missing observations FALSE:","code":"multi <- tribble( ~id, ~choice1, ~choice2, ~choice3, 1, \"A\", \"B\", \"C\", 2, \"C\", \"B\", NA, 3, \"D\", NA, NA, 4, \"B\", \"D\", NA ) multi2 <- multi %>% pivot_longer( cols = !id, values_drop_na = TRUE ) %>% mutate(checked = TRUE) multi2 #> # A tibble: 8 × 4 #> id name value checked #> #> 1 1 choice1 A TRUE #> 2 1 choice2 B TRUE #> 3 1 choice3 C TRUE #> 4 2 choice1 C TRUE #> 5 2 choice2 B TRUE #> 6 3 choice1 D TRUE #> 7 4 choice1 B TRUE #> 8 4 choice2 D TRUE multi2 %>% pivot_wider( id_cols = id, names_from = value, values_from = checked, values_fill = FALSE ) #> # A tibble: 4 × 5 #> id A B C D #> #> 1 1 TRUE TRUE TRUE FALSE #> 2 2 FALSE TRUE TRUE FALSE #> 3 3 FALSE FALSE FALSE TRUE #> 4 4 FALSE TRUE FALSE TRUE"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"manual-specs","dir":"Articles","previous_headings":"","what":"Manual specs","title":"Pivoting","text":"arguments pivot_longer() pivot_wider() allow pivot wide range datasets. creativity people apply data structures seemingly endless, ’s quite possible encounter dataset can’t immediately see reshape pivot_longer() pivot_wider(). gain control pivoting, can instead create “spec” data frame describes exactly data stored column names becomes variables (vice versa). section introduces spec data structure, show use pivot_longer() pivot_wider() insufficient.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"longer-1","dir":"Articles","previous_headings":"Manual specs","what":"Longer","title":"Pivoting","text":"see works, lets return simplest case pivoting applied relig_income dataset. Now pivoting happens two steps: first create spec object (using build_longer_spec()) use describe pivoting operation: (gives result , just code. ’s need use , presented simple example using spec.) spec look like? ’s data frame one row column wide format version data present long format, two special columns start .: .name gives name column. .value gives name column values cells go . also one column spec column present long format data present wide format data. corresponds names_to argument pivot_longer() build_longer_spec() names_from argument pivot_wider() build_wider_spec(). example, income column character vector names columns pivoted.","code":"spec <- relig_income %>% build_longer_spec( cols = !religion, names_to = \"income\", values_to = \"count\" ) pivot_longer_spec(relig_income, spec) #> # A tibble: 180 × 3 #> religion income count #> #> 1 Agnostic <$10k 27 #> 2 Agnostic $10-20k 34 #> 3 Agnostic $20-30k 60 #> 4 Agnostic $30-40k 81 #> 5 Agnostic $40-50k 76 #> 6 Agnostic $50-75k 137 #> 7 Agnostic $75-100k 122 #> 8 Agnostic $100-150k 109 #> 9 Agnostic >150k 84 #> 10 Agnostic Don't know/refused 96 #> # ℹ 170 more rows spec #> # A tibble: 10 × 3 #> .name .value income #> #> 1 <$10k count <$10k #> 2 $10-20k count $10-20k #> 3 $20-30k count $20-30k #> 4 $30-40k count $30-40k #> 5 $40-50k count $40-50k #> 6 $50-75k count $50-75k #> 7 $75-100k count $75-100k #> 8 $100-150k count $100-150k #> 9 >150k count >150k #> 10 Don't know/refused count Don't know/refused"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"wider-1","dir":"Articles","previous_headings":"Manual specs","what":"Wider","title":"Pivoting","text":"widen us_rent_income pivot_wider(). result ok, think improved: think better columns income, rent, income_moe, rent_moe, can achieve manual spec. current spec looks like : case, mutate spec carefully construct column names: Supplying spec pivot_wider() gives us result ’re looking :","code":"us_rent_income %>% pivot_wider( names_from = variable, values_from = c(estimate, moe) ) #> # A tibble: 52 × 6 #> GEOID NAME estimate_income estimate_rent moe_income moe_rent #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Co… 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows spec1 <- us_rent_income %>% build_wider_spec( names_from = variable, values_from = c(estimate, moe) ) spec1 #> # A tibble: 4 × 3 #> .name .value variable #> #> 1 estimate_income estimate income #> 2 estimate_rent estimate rent #> 3 moe_income moe income #> 4 moe_rent moe rent spec2 <- spec1 %>% mutate( .name = paste0(variable, ifelse(.value == \"moe\", \"_moe\", \"\")) ) spec2 #> # A tibble: 4 × 3 #> .name .value variable #> #> 1 income estimate income #> 2 rent estimate rent #> 3 income_moe moe income #> 4 rent_moe moe rent us_rent_income %>% pivot_wider_spec(spec2) #> # A tibble: 52 × 6 #> GEOID NAME income rent income_moe rent_moe #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Columbia 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"by-hand","dir":"Articles","previous_headings":"Manual specs","what":"By hand","title":"Pivoting","text":"Sometimes ’s possible (convenient) compute spec, instead ’s convenient construct spec “hand”. example, take construction data, lightly modified Table 5 “completions” found https://www.census.gov/construction/nrc/index.html: sort data uncommon government agencies: column names actually belong different variables, summaries number units (1, 2-4, 5+) regions country (NE, NW, midwest, S, W). can easily describe tibble: yields following longer form: Note overlap units region variables; data really naturally described two independent tables.","code":"construction #> # A tibble: 9 × 9 #> Year Month `1 unit` `2 to 4 units` `5 units or more` Northeast Midwest #> #> 1 2018 Janua… 859 NA 348 114 169 #> 2 2018 Febru… 882 NA 400 138 160 #> 3 2018 March 862 NA 356 150 154 #> 4 2018 April 797 NA 447 144 196 #> 5 2018 May 875 NA 364 90 169 #> 6 2018 June 867 NA 342 76 170 #> 7 2018 July 829 NA 360 108 183 #> 8 2018 August 939 NA 286 90 205 #> 9 2018 Septe… 835 NA 304 117 175 #> # ℹ 2 more variables: South , West spec <- tribble( ~.name, ~.value, ~units, ~region, \"1 unit\", \"n\", \"1\", NA, \"2 to 4 units\", \"n\", \"2-4\", NA, \"5 units or more\", \"n\", \"5+\", NA, \"Northeast\", \"n\", NA, \"Northeast\", \"Midwest\", \"n\", NA, \"Midwest\", \"South\", \"n\", NA, \"South\", \"West\", \"n\", NA, \"West\", ) construction %>% pivot_longer_spec(spec) #> # A tibble: 63 × 5 #> Year Month units region n #> #> 1 2018 January 1 NA 859 #> 2 2018 January 2-4 NA NA #> 3 2018 January 5+ NA 348 #> 4 2018 January NA Northeast 114 #> 5 2018 January NA Midwest 169 #> 6 2018 January NA South 596 #> 7 2018 January NA West 339 #> 8 2018 February 1 NA 882 #> 9 2018 February 2-4 NA NA #> 10 2018 February 5+ NA 400 #> # ℹ 53 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/pivot.html","id":"theory","dir":"Articles","previous_headings":"Manual specs","what":"Theory","title":"Pivoting","text":"One neat property spec need spec pivot_longer() pivot_wider(). makes clear two operations symmetric: pivoting spec allows us precise exactly pivot_longer(df, spec = spec) changes shape df: nrow(df) * nrow(spec) rows, ncol(df) - nrow(spec) + ncol(spec) - 2 columns.","code":"construction %>% pivot_longer_spec(spec) %>% pivot_wider_spec(spec) #> # A tibble: 9 × 9 #> Year Month `1 unit` `2 to 4 units` `5 units or more` Northeast Midwest #> #> 1 2018 Janua… 859 NA 348 114 169 #> 2 2018 Febru… 882 NA 400 138 160 #> 3 2018 March 862 NA 356 150 154 #> 4 2018 April 797 NA 447 144 196 #> 5 2018 May 875 NA 364 90 169 #> 6 2018 June 867 NA 342 76 170 #> 7 2018 July 829 NA 360 108 183 #> 8 2018 August 939 NA 286 90 205 #> 9 2018 Septe… 835 NA 304 117 175 #> # ℹ 2 more variables: South , West "},{"path":"https://tidyr.tidyverse.org/dev/articles/programming.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Programming with tidyr","text":"tidyr verbs use tidy evaluation make interactive data exploration fast fluid. Tidy evaluation special type non-standard evaluation used throughout tidyverse. ’s typical tidyr code: Tidy evaluation can use !Species say “columns except Species”, without quote column name (\"Species\") refer enclosing data frame (iris$Species). Two basic forms tidy evaluation used tidyr: Tidy selection: drop_na(), fill(), pivot_longer()/pivot_wider(), nest()/unnest(), separate()/extract(), unite() let select variables based position, name, type (e.g. 1:3, starts_with(\"x\"), .numeric). Literally, can use techniques dplyr::select(). Data masking: expand(), crossing() nesting() let refer use data variables variables environment (.e. write my_variable df$my_variable). focus tidy selection , since ’s common. can learn data masking equivalent vignette dplyr: https://dplyr.tidyverse.org/dev/articles/programming.html. considerations writing tidyr code packages, please see vignette(\"-packages\"). ’ve pointed tidyr’s tidy evaluation interface optimized interactive exploration. flip side adds challenges indirect use, .e. ’re working inside loop function. vignette shows overcome challenges. ’ll first go basics tidy selection data masking, talk use indirectly, show number recipes solve common problems. go , reveal version tidyr ’re using make small dataset use examples.","code":"library(tidyr) iris %>% nest(data = !Species) #> # A tibble: 3 × 2 #> Species data #> #> 1 setosa #> 2 versicolor #> 3 virginica packageVersion(\"tidyr\") #> [1] '1.3.1.9000' mini_iris <- as_tibble(iris)[c(1, 2, 51, 52, 101, 102), ] mini_iris #> # A tibble: 6 × 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 7 3.2 4.7 1.4 versicolor #> 4 6.4 3.2 4.5 1.5 versicolor #> 5 6.3 3.3 6 2.5 virginica #> 6 5.8 2.7 5.1 1.9 virginica"},{"path":"https://tidyr.tidyverse.org/dev/articles/programming.html","id":"tidy-selection","dir":"Articles","previous_headings":"","what":"Tidy selection","title":"Programming with tidyr","text":"Underneath functions use tidy selection tidyselect package. provides miniature domain specific language makes easy select columns name, position, type. example: select(df, 1) selects first column; select(df, last_col()) selects last column. select(df, c(, b, c)) selects columns , b, c. select(df, starts_with(\"\")) selects columns whose name starts “”; select(df, ends_with(\"z\")) selects columns whose name ends “z”. select(df, (.numeric)) selects numeric columns. can see details ?tidyr_tidy_select.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/programming.html","id":"indirection","dir":"Articles","previous_headings":"Tidy selection","what":"Indirection","title":"Programming with tidyr","text":"Tidy selection makes common task easier cost making less common task harder. want use tidy select indirectly column specification stored intermediate variable, ’ll need learn new tools. three main cases comes : tidy-select specification function argument, must embrace argument surrounding doubled braces. character vector variable names, must use all_of() any_of() depending whether want function error variable found. functions allow write loops function takes variable names character vector. complicated cases, might want use tidyselect directly: Learn vignette(\"tidyselect\"). Note many tidyr functions use ... can easily select many variables, e.g. fill(df, x, y, z). now believe disadvantages approach outweigh benefits, interface better fill(df, c(x, y, z)). new functions select columns, please just use single argument ....","code":"nest_egg <- function(df, cols) { nest(df, egg = {{ cols }}) } nest_egg(mini_iris, !Species) #> # A tibble: 3 × 2 #> Species egg #> #> 1 setosa #> 2 versicolor #> 3 virginica nest_egg <- function(df, cols) { nest(df, egg = all_of(cols)) } vars <- c(\"Sepal.Length\", \"Sepal.Width\", \"Petal.Length\", \"Petal.Width\") nest_egg(mini_iris, vars) #> # A tibble: 3 × 2 #> Species egg #> #> 1 setosa #> 2 versicolor #> 3 virginica sel_vars <- function(df, cols) { tidyselect::eval_select(rlang::enquo(cols), df) } sel_vars(mini_iris, !Species) #> Sepal.Length Sepal.Width Petal.Length Petal.Width #> 1 2 3 4"},{"path":"https://tidyr.tidyverse.org/dev/articles/rectangle.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Rectangling","text":"Rectangling art craft taking deeply nested list (often sourced wild caught JSON XML) taming tidy data set rows columns. three functions tidyr particularly useful rectangling: unnest_longer() takes element list-column makes new row. unnest_wider() takes element list-column makes new column. hoist() similar unnest_wider() plucks selected components, can reach multiple levels. (Alternative, complex inputs need rectangle nested list according specification, see tibblify package.) large number data rectangling problems can solved combining jsonlite::read_json() functions splash dplyr (largely eliminating prior approaches combined mutate() multiple purrr::map()s). Note jsonlite another important function called fromJSON(). don’t recommend performs automatic simplification (simplifyVector = TRUE). often works well, particularly simple cases, think ’re better rectangling know exactly ’s happening can easily handle complicated nested structures. illustrate techniques, ’ll use repurrrsive package, provides number deeply nested lists originally mostly captured web APIs.","code":"library(tidyr) library(dplyr) library(repurrrsive)"},{"path":"https://tidyr.tidyverse.org/dev/articles/rectangle.html","id":"github-users","dir":"Articles","previous_headings":"","what":"GitHub users","title":"Rectangling","text":"’ll start gh_users, list contains information six GitHub users. begin, put gh_users list data frame: seems bit counter-intuitive: first step making list simpler make complicated? data frame big advantage: bundles together multiple vectors everything tracked together single object. user named list, element represents column. two ways turn list components columns. unnest_wider() takes every component makes new column: case, many components don’t need can instead use hoist(). hoist() allows us pull selected components using syntax purrr::pluck(): hoist() removes named components user list-column, can think moving components inner list top-level data frame.","code":"users <- tibble(user = gh_users) names(users$user[[1]]) #> [1] \"login\" \"id\" \"avatar_url\" #> [4] \"gravatar_id\" \"url\" \"html_url\" #> [7] \"followers_url\" \"following_url\" \"gists_url\" #> [10] \"starred_url\" \"subscriptions_url\" \"organizations_url\" #> [13] \"repos_url\" \"events_url\" \"received_events_url\" #> [16] \"type\" \"site_admin\" \"name\" #> [19] \"company\" \"blog\" \"location\" #> [22] \"email\" \"hireable\" \"bio\" #> [25] \"public_repos\" \"public_gists\" \"followers\" #> [28] \"following\" \"created_at\" \"updated_at\" users %>% unnest_wider(user) #> # A tibble: 6 × 30 #> login id avatar_url gravatar_id url html_url followers_url #> #> 1 gaborcsardi 660288 https://a… \"\" http… https:/… https://api.… #> 2 jennybc 599454 https://a… \"\" http… https:/… https://api.… #> 3 jtleek 1571674 https://a… \"\" http… https:/… https://api.… #> 4 juliasilge 12505835 https://a… \"\" http… https:/… https://api.… #> 5 leeper 3505428 https://a… \"\" http… https:/… https://api.… #> 6 masalmon 8360597 https://a… \"\" http… https:/… https://api.… #> # ℹ 23 more variables: following_url , gists_url , #> # starred_url , subscriptions_url , organizations_url , #> # repos_url , events_url , received_events_url , #> # type , site_admin , name , company , blog , #> # location , email , hireable , bio , #> # public_repos , public_gists , followers , #> # following , created_at , updated_at users %>% hoist(user, followers = \"followers\", login = \"login\", url = \"html_url\" ) #> # A tibble: 6 × 4 #> followers login url user #> #> 1 303 gaborcsardi https://github.com/gaborcsardi #> 2 780 jennybc https://github.com/jennybc #> 3 3958 jtleek https://github.com/jtleek #> 4 115 juliasilge https://github.com/juliasilge #> 5 213 leeper https://github.com/leeper #> 6 34 masalmon https://github.com/masalmon "},{"path":"https://tidyr.tidyverse.org/dev/articles/rectangle.html","id":"github-repos","dir":"Articles","previous_headings":"","what":"GitHub repos","title":"Rectangling","text":"start gh_repos similarly, putting tibble: time elements repos list repositories belong user. observations, become new rows, use unnest_longer() rather unnest_wider(): can use unnest_wider() hoist(): Note use c(\"owner\", \"login\"): allows us reach two levels deep inside list. alternative approach pull just owner put element column:","code":"repos <- tibble(repo = gh_repos) repos #> # A tibble: 6 × 1 #> repo #> #> 1 #> 2 #> 3 #> 4 #> 5 #> 6 repos <- repos %>% unnest_longer(repo) repos #> # A tibble: 176 × 1 #> repo #> #> 1 #> 2 #> 3 #> 4 #> 5 #> 6 #> 7 #> 8 #> 9 #> 10 #> # ℹ 166 more rows repos %>% hoist(repo, login = c(\"owner\", \"login\"), name = \"name\", homepage = \"homepage\", watchers = \"watchers_count\" ) #> # A tibble: 176 × 5 #> login name homepage watchers repo #> #> 1 gaborcsardi after NA 5 #> 2 gaborcsardi argufy NA 19 #> 3 gaborcsardi ask NA 5 #> 4 gaborcsardi baseimports NA 0 #> 5 gaborcsardi citest NA 0 #> 6 gaborcsardi clisymbols \"\" 18 #> 7 gaborcsardi cmaker NA 0 #> 8 gaborcsardi cmark NA 0 #> 9 gaborcsardi conditions NA 0 #> 10 gaborcsardi crayon NA 52 #> # ℹ 166 more rows repos %>% hoist(repo, owner = \"owner\") %>% unnest_wider(owner) #> # A tibble: 176 × 18 #> login id avatar_url gravatar_id url html_url followers_url #> #> 1 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 2 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 3 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 4 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 5 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 6 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 7 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 8 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 9 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> 10 gaborcsardi 660288 https://av… \"\" http… https:/… https://api.… #> # ℹ 166 more rows #> # ℹ 11 more variables: following_url , gists_url , #> # starred_url , subscriptions_url , organizations_url , #> # repos_url , events_url , received_events_url , #> # type , site_admin , repo "},{"path":"https://tidyr.tidyverse.org/dev/articles/rectangle.html","id":"game-of-thrones-characters","dir":"Articles","previous_headings":"","what":"Game of Thrones characters","title":"Rectangling","text":"got_chars similar structure gh_users: ’s list named lists, element inner list describes attribute GoT character. start way, first creating data frame unnesting component column: complex gh_users component char list, giving us collection list-columns: next depend purposes analysis. Maybe want row every book TV series character appears : maybe want build table lets match title name: (Note empty titles (\"\") due infelicity input got_chars: ideally people without titles title vector length 0, title vector length 1 containing empty string.)","code":"chars <- tibble(char = got_chars) chars #> # A tibble: 30 × 1 #> char #> #> 1 #> 2 #> 3 #> 4 #> 5 #> 6 #> 7 #> 8 #> 9 #> 10 #> # ℹ 20 more rows chars2 <- chars %>% unnest_wider(char) chars2 #> # A tibble: 30 × 18 #> url id name gender culture born died alive titles aliases #> #> 1 https://ww… 1022 Theo… Male \"Ironb… \"In … \"\" TRUE #> 2 https://ww… 1052 Tyri… Male \"\" \"In … \"\" TRUE #> 3 https://ww… 1074 Vict… Male \"Ironb… \"In … \"\" TRUE #> 4 https://ww… 1109 Will Male \"\" \"\" \"In … FALSE #> 5 https://ww… 1166 Areo… Male \"Norvo… \"In … \"\" TRUE #> 6 https://ww… 1267 Chett Male \"\" \"At … \"In … FALSE #> 7 https://ww… 1295 Cres… Male \"\" \"In … \"In … FALSE #> 8 https://ww… 130 Aria… Female \"Dorni… \"In … \"\" TRUE #> 9 https://ww… 1303 Daen… Female \"Valyr… \"In … \"\" TRUE #> 10 https://ww… 1319 Davo… Male \"Weste… \"In … \"\" TRUE #> # ℹ 20 more rows #> # ℹ 8 more variables: father , mother , spouse , #> # allegiances , books , povBooks , tvSeries , #> # playedBy chars2 %>% select_if(is.list) #> # A tibble: 30 × 7 #> titles aliases allegiances books povBooks tvSeries playedBy #> #> 1 #> 2 #> 3 #> 4 #> 5 #> 6 #> 7 #> 8 #> 9 #> 10 #> # ℹ 20 more rows chars2 %>% select(name, books, tvSeries) %>% pivot_longer(c(books, tvSeries), names_to = \"media\", values_to = \"value\") %>% unnest_longer(value) #> # A tibble: 179 × 3 #> name media value #> #> 1 Theon Greyjoy books A Game of Thrones #> 2 Theon Greyjoy books A Storm of Swords #> 3 Theon Greyjoy books A Feast for Crows #> 4 Theon Greyjoy tvSeries Season 1 #> 5 Theon Greyjoy tvSeries Season 2 #> 6 Theon Greyjoy tvSeries Season 3 #> 7 Theon Greyjoy tvSeries Season 4 #> 8 Theon Greyjoy tvSeries Season 5 #> 9 Theon Greyjoy tvSeries Season 6 #> 10 Tyrion Lannister books A Feast for Crows #> # ℹ 169 more rows chars2 %>% select(name, title = titles) %>% unnest_longer(title) #> # A tibble: 59 × 2 #> name title #> #> 1 Theon Greyjoy \"Prince of Winterfell\" #> 2 Theon Greyjoy \"Lord of the Iron Islands (by law of the green lands… #> 3 Tyrion Lannister \"Acting Hand of the King (former)\" #> 4 Tyrion Lannister \"Master of Coin (former)\" #> 5 Victarion Greyjoy \"Lord Captain of the Iron Fleet\" #> 6 Victarion Greyjoy \"Master of the Iron Victory\" #> 7 Will \"\" #> 8 Areo Hotah \"Captain of the Guard at Sunspear\" #> 9 Chett \"\" #> 10 Cressen \"Maester\" #> # ℹ 49 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/rectangle.html","id":"geocoding-with-google","dir":"Articles","previous_headings":"","what":"Geocoding with google","title":"Rectangling","text":"Next ’ll tackle complex form data comes Google’s geocoding service, stored repurssive package json list-column named lists, makes sense start unnest_wider(): Notice results list lists. cities 1 element (representing unique match geocoding API), Washington Arlington two. can pull separate rows unnest_longer(): Now components, revealed unnest_wider(): can find latitude longitude unnesting geometry: location: also just look first address city: use hoist() dive deeply get directly lat lng:","code":"repurrrsive::gmaps_cities #> # A tibble: 5 × 2 #> city json #> #> 1 Houston #> 2 Washington #> 3 New York #> 4 Chicago #> 5 Arlington repurrrsive::gmaps_cities %>% unnest_wider(json) #> # A tibble: 5 × 3 #> city results status #> #> 1 Houston OK #> 2 Washington OK #> 3 New York OK #> 4 Chicago OK #> 5 Arlington OK repurrrsive::gmaps_cities %>% unnest_wider(json) %>% unnest_longer(results) #> # A tibble: 7 × 3 #> city results status #> #> 1 Houston OK #> 2 Washington OK #> 3 Washington OK #> 4 New York OK #> 5 Chicago OK #> 6 Arlington OK #> 7 Arlington OK repurrrsive::gmaps_cities %>% unnest_wider(json) %>% unnest_longer(results) %>% unnest_wider(results) #> # A tibble: 7 × 7 #> city address_components formatted_address geometry place_id types #> #> 1 Houst… Houston, TX, USA ChIJAYW… #> 2 Washi… Washington, USA ChIJ-bD… #> 3 Washi… Washington, DC, … ChIJW-T… #> 4 New Y… New York, NY, USA ChIJOwg… #> 5 Chica… Chicago, IL, USA ChIJ7cv… #> 6 Arlin… Arlington, TX, U… ChIJ05g… #> 7 Arlin… Arlington, VA, U… ChIJD6e… #> # ℹ 1 more variable: status repurrrsive::gmaps_cities %>% unnest_wider(json) %>% unnest_longer(results) %>% unnest_wider(results) %>% unnest_wider(geometry) #> # A tibble: 7 × 10 #> city address_components formatted_address bounds location #> #> 1 Houston Houston, TX, USA #> 2 Washingt… Washington, USA #> 3 Washingt… Washington, DC, … #> 4 New York New York, NY, USA #> 5 Chicago Chicago, IL, USA #> 6 Arlington Arlington, TX, U… #> 7 Arlington Arlington, VA, U… #> # ℹ 5 more variables: location_type , viewport , #> # place_id , types , status repurrrsive::gmaps_cities %>% unnest_wider(json) %>% unnest_longer(results) %>% unnest_wider(results) %>% unnest_wider(geometry) %>% unnest_wider(location) #> # A tibble: 7 × 11 #> city address_components formatted_address bounds lat lng #> #> 1 Houston Houston, TX, USA 29.8 -95.4 #> 2 Washingt… Washington, USA 47.8 -121. #> 3 Washingt… Washington, DC, … 38.9 -77.0 #> 4 New York New York, NY, USA 40.7 -74.0 #> 5 Chicago Chicago, IL, USA 41.9 -87.6 #> 6 Arlington Arlington, TX, U… 32.7 -97.1 #> 7 Arlington Arlington, VA, U… 38.9 -77.1 #> # ℹ 5 more variables: location_type , viewport , #> # place_id , types , status repurrrsive::gmaps_cities %>% unnest_wider(json) %>% hoist(results, first_result = 1) %>% unnest_wider(first_result) %>% unnest_wider(geometry) %>% unnest_wider(location) #> # A tibble: 5 × 12 #> city address_components formatted_address bounds lat lng #> #> 1 Houston Houston, TX, USA 29.8 -95.4 #> 2 Washingt… Washington, USA 47.8 -121. #> 3 New York New York, NY, USA 40.7 -74.0 #> 4 Chicago Chicago, IL, USA 41.9 -87.6 #> 5 Arlington Arlington, TX, U… 32.7 -97.1 #> # ℹ 6 more variables: location_type , viewport , #> # place_id , types , results , status repurrrsive::gmaps_cities %>% hoist(json, lat = list(\"results\", 1, \"geometry\", \"location\", \"lat\"), lng = list(\"results\", 1, \"geometry\", \"location\", \"lng\") ) #> # A tibble: 5 × 4 #> city lat lng json #> #> 1 Houston 29.8 -95.4 #> 2 Washington 47.8 -121. #> 3 New York 40.7 -74.0 #> 4 Chicago 41.9 -87.6 #> 5 Arlington 32.7 -97.1 "},{"path":"https://tidyr.tidyverse.org/dev/articles/rectangle.html","id":"sharla-gelfands-discography","dir":"Articles","previous_headings":"","what":"Sharla Gelfand’s discography","title":"Rectangling","text":"’ll finish complex list, Sharla Gelfand’s discography. ’ll start usual way: putting list single column data frame, widening component column. also parse date_added column real date-time1. level, see information disc added Sharla’s discography, information disc . need widen basic_information column: Unfortunately fails ’s id column inside basic_information. can quickly see ’s going setting names_repair = \"unique\": problem basic_information repeats id column ’s also stored top-level, can just drop : Alternatively, use hoist(): quickly extract name first label artist indexing deeply nested list. systematic approach create separate tables artist label: join back original dataset needed.","code":"discs <- tibble(disc = discog) %>% unnest_wider(disc) %>% mutate(date_added = as.POSIXct(strptime(date_added, \"%Y-%m-%dT%H:%M:%S\"))) discs #> # A tibble: 155 × 5 #> instance_id date_added basic_information id rating #> #> 1 354823933 2019-02-16 17:48:59 7496378 0 #> 2 354092601 2019-02-13 14:13:11 4490852 0 #> 3 354091476 2019-02-13 14:07:23 9827276 0 #> 4 351244906 2019-02-02 11:39:58 9769203 0 #> 5 351244801 2019-02-02 11:39:37 7237138 0 #> 6 351052065 2019-02-01 20:40:53 13117042 0 #> 7 350315345 2019-01-29 15:48:37 7113575 0 #> 8 350315103 2019-01-29 15:47:22 10540713 0 #> 9 350314507 2019-01-29 15:44:08 11260950 0 #> 10 350314047 2019-01-29 15:41:35 11726853 0 #> # ℹ 145 more rows discs %>% unnest_wider(basic_information) #> Error in `unnest_wider()`: #> ! Can't duplicate names between the affected columns and the #> original data. #> ✖ These names are duplicated: #> ℹ `id`, from `basic_information`. #> ℹ Use `names_sep` to disambiguate using the column name. #> ℹ Or use `names_repair` to specify a repair strategy. discs %>% unnest_wider(basic_information, names_repair = \"unique\") #> New names: #> • `id` -> `id...7` #> • `id` -> `id...14` #> # A tibble: 155 × 15 #> instance_id date_added labels year master_url artists id...7 #> #> 1 354823933 2019-02-16 17:48:59 2015 NA 7.50e6 #> 2 354092601 2019-02-13 14:13:11 2013 https://ap… 4.49e6 #> 3 354091476 2019-02-13 14:07:23 2017 https://ap… 9.83e6 #> 4 351244906 2019-02-02 11:39:58 2017 https://ap… 9.77e6 #> 5 351244801 2019-02-02 11:39:37 2015 https://ap… 7.24e6 #> 6 351052065 2019-02-01 20:40:53 2019 https://ap… 1.31e7 #> 7 350315345 2019-01-29 15:48:37 2014 https://ap… 7.11e6 #> 8 350315103 2019-01-29 15:47:22 2015 https://ap… 1.05e7 #> 9 350314507 2019-01-29 15:44:08 2017 https://ap… 1.13e7 #> 10 350314047 2019-01-29 15:41:35 2017 NA 1.17e7 #> # ℹ 145 more rows #> # ℹ 8 more variables: thumb , title , formats , #> # cover_image , resource_url , master_id , #> # id...14 , rating discs %>% select(!id) %>% unnest_wider(basic_information) #> # A tibble: 155 × 14 #> instance_id date_added labels year master_url artists id #> #> 1 354823933 2019-02-16 17:48:59 2015 NA 7.50e6 #> 2 354092601 2019-02-13 14:13:11 2013 https://ap… 4.49e6 #> 3 354091476 2019-02-13 14:07:23 2017 https://ap… 9.83e6 #> 4 351244906 2019-02-02 11:39:58 2017 https://ap… 9.77e6 #> 5 351244801 2019-02-02 11:39:37 2015 https://ap… 7.24e6 #> 6 351052065 2019-02-01 20:40:53 2019 https://ap… 1.31e7 #> 7 350315345 2019-01-29 15:48:37 2014 https://ap… 7.11e6 #> 8 350315103 2019-01-29 15:47:22 2015 https://ap… 1.05e7 #> 9 350314507 2019-01-29 15:44:08 2017 https://ap… 1.13e7 #> 10 350314047 2019-01-29 15:41:35 2017 NA 1.17e7 #> # ℹ 145 more rows #> # ℹ 7 more variables: thumb , title , formats , #> # cover_image , resource_url , master_id , rating discs %>% hoist(basic_information, title = \"title\", year = \"year\", label = list(\"labels\", 1, \"name\"), artist = list(\"artists\", 1, \"name\") ) #> # A tibble: 155 × 9 #> instance_id date_added title year label artist #> #> 1 354823933 2019-02-16 17:48:59 Demo 2015 Tobi… Mollot #> 2 354092601 2019-02-13 14:13:11 Observant Com El Mo… 2013 La V… Una B… #> 3 354091476 2019-02-13 14:07:23 I 2017 La V… S.H.I… #> 4 351244906 2019-02-02 11:39:58 Oído Absoluto 2017 La V… Rata … #> 5 351244801 2019-02-02 11:39:37 A Cat's Cause, No D… 2015 Kato… Ivy (… #> 6 351052065 2019-02-01 20:40:53 Tashme 2019 High… Tashme #> 7 350315345 2019-01-29 15:48:37 Demo 2014 Mind… Desgr… #> 8 350315103 2019-01-29 15:47:22 Let The Miracles Be… 2015 Not … Phant… #> 9 350314507 2019-01-29 15:44:08 Sub Space 2017 Not … Sub S… #> 10 350314047 2019-01-29 15:41:35 Demo 2017 Pres… Small… #> # ℹ 145 more rows #> # ℹ 3 more variables: basic_information , id , rating discs %>% hoist(basic_information, artist = \"artists\") %>% select(disc_id = id, artist) %>% unnest_longer(artist) %>% unnest_wider(artist) #> # A tibble: 167 × 8 #> disc_id join name anv tracks role resource_url id #> #> 1 7496378 \"\" Mollot \"\" \"\" \"\" https://api… 4.62e6 #> 2 4490852 \"\" Una Bèstia Incon… \"\" \"\" \"\" https://api… 3.19e6 #> 3 9827276 \"\" S.H.I.T. (3) \"\" \"\" \"\" https://api… 2.77e6 #> 4 9769203 \"\" Rata Negra \"\" \"\" \"\" https://api… 4.28e6 #> 5 7237138 \"\" Ivy (18) \"\" \"\" \"\" https://api… 3.60e6 #> 6 13117042 \"\" Tashme \"\" \"\" \"\" https://api… 5.21e6 #> 7 7113575 \"\" Desgraciados \"\" \"\" \"\" https://api… 4.45e6 #> 8 10540713 \"\" Phantom Head \"\" \"\" \"\" https://api… 4.27e6 #> 9 11260950 \"\" Sub Space (2) \"\" \"\" \"\" https://api… 5.69e6 #> 10 11726853 \"\" Small Man (2) \"\" \"\" \"\" https://api… 6.37e6 #> # ℹ 157 more rows discs %>% hoist(basic_information, format = \"formats\") %>% select(disc_id = id, format) %>% unnest_longer(format) %>% unnest_wider(format) %>% unnest_longer(descriptions) #> # A tibble: 258 × 5 #> disc_id descriptions text name qty #> #> 1 7496378 \"Numbered\" Black Cassette 1 #> 2 4490852 \"LP\" NA Vinyl 1 #> 3 9827276 \"7\\\"\" NA Vinyl 1 #> 4 9827276 \"45 RPM\" NA Vinyl 1 #> 5 9827276 \"EP\" NA Vinyl 1 #> 6 9769203 \"LP\" NA Vinyl 1 #> 7 9769203 \"Album\" NA Vinyl 1 #> 8 7237138 \"7\\\"\" NA Vinyl 1 #> 9 7237138 \"45 RPM\" NA Vinyl 1 #> 10 13117042 \"7\\\"\" NA Vinyl 1 #> # ℹ 248 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"data-tidying","dir":"Articles","previous_headings":"","what":"Data tidying","title":"Tidy data","text":"often said 80% data analysis spent cleaning preparing data. ’s just first step, must repeated many times course analysis new problems come light new data collected. get handle problem, paper focuses small, important, aspect data cleaning call data tidying: structuring datasets facilitate analysis. principles tidy data provide standard way organise data values within dataset. standard makes initial data cleaning easier don’t need start scratch reinvent wheel every time. tidy data standard designed facilitate initial exploration analysis data, simplify development data analysis tools work well together. Current tools often require translation. spend time munging output one tool can input another. Tidy datasets tidy tools work hand hand make data analysis easier, allowing focus interesting domain problem, uninteresting logistics data.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"defining","dir":"Articles","previous_headings":"","what":"Defining tidy data","title":"Tidy data","text":"Happy families alike; every unhappy family unhappy way — Leo Tolstoy Like families, tidy datasets alike every messy dataset messy way. Tidy datasets provide standardized way link structure dataset (physical layout) semantics (meaning). section, ’ll provide standard vocabulary describing structure semantics dataset, use definitions define tidy data.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"data-structure","dir":"Articles","previous_headings":"Defining tidy data","what":"Data structure","title":"Tidy data","text":"statistical datasets data frames made rows columns. columns almost always labeled rows sometimes labeled. following code provides data imaginary classroom format commonly seen wild. table three columns four rows, rows columns labeled. many ways structure underlying data. following table shows data , rows columns transposed. data , layout different. vocabulary rows columns simply rich enough describe two tables represent data. addition appearance, need way describe underlying semantics, meaning, values displayed table.","code":"library(tibble) classroom <- tribble( ~name, ~quiz1, ~quiz2, ~test1, \"Billy\", NA, \"D\", \"C\", \"Suzy\", \"F\", NA, NA, \"Lionel\", \"B\", \"C\", \"B\", \"Jenny\", \"A\", \"A\", \"B\" ) classroom #> # A tibble: 4 × 4 #> name quiz1 quiz2 test1 #> #> 1 Billy NA D C #> 2 Suzy F NA NA #> 3 Lionel B C B #> 4 Jenny A A B tribble( ~assessment, ~Billy, ~Suzy, ~Lionel, ~Jenny, \"quiz1\", NA, \"F\", \"B\", \"A\", \"quiz2\", \"D\", NA, \"C\", \"A\", \"test1\", \"C\", NA, \"B\", \"B\" ) #> # A tibble: 3 × 5 #> assessment Billy Suzy Lionel Jenny #> #> 1 quiz1 NA F B A #> 2 quiz2 D NA C A #> 3 test1 C NA B B"},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"data-semantics","dir":"Articles","previous_headings":"Defining tidy data","what":"Data semantics","title":"Tidy data","text":"dataset collection values, usually either numbers (quantitative) strings (qualitative). Values organised two ways. Every value belongs variable observation. variable contains values measure underlying attribute (like height, temperature, duration) across units. observation contains values measured unit (like person, day, race) across attributes. tidy version classroom data looks like : (’ll learn functions work little later) makes values, variables, observations clear. dataset contains 36 values representing three variables 12 observations. variables : name, four possible values (Billy, Suzy, Lionel, Jenny). assessment, three possible values (quiz1, quiz2, test1). grade, five six values depending think missing value (, B, C, D, F, NA). tidy data frame explicitly tells us definition observation. classroom, every combination name assessment single measured observation. dataset also informs us missing values, can meaning. Billy absent first quiz, tried salvage grade. Suzy failed first quiz, decided drop class. calculate Billy’s final grade, might replace missing value F (might get second chance take quiz). However, want know class average Test 1, dropping Suzy’s structural missing value appropriate imputing new value. given dataset, ’s usually easy figure observations variables, surprisingly difficult precisely define variables observations general. example, columns classroom data height weight happy call variables. columns height width, less clear cut, might think height width values dimension variable. columns home phone work phone, treat two variables, fraud detection environment might want variables phone number number type use one phone number multiple people might suggest fraud. general rule thumb easier describe functional relationships variables (e.g., z linear combination x y, density ratio weight volume) rows, easier make comparisons groups observations (e.g., average group vs. average group b) groups columns. given analysis, may multiple levels observation. example, trial new allergy medication might three observational types: demographic data collected person (age, sex, race), medical data collected person day (number sneezes, redness eyes), meteorological data collected day (temperature, pollen count). Variables may change course analysis. Often variables raw data fine grained, may add extra modelling complexity little explanatory gain. example, many surveys ask variations question better get underlying trait. early stages analysis, variables correspond questions. later stages, change focus traits, computed averaging together multiple questions. considerably simplifies analysis don’t need hierarchical model, can often pretend data continuous, discrete.","code":"library(tidyr) library(dplyr) classroom2 <- classroom %>% pivot_longer(quiz1:test1, names_to = \"assessment\", values_to = \"grade\") %>% arrange(name, assessment) classroom2 #> # A tibble: 12 × 3 #> name assessment grade #> #> 1 Billy quiz1 NA #> 2 Billy quiz2 D #> 3 Billy test1 C #> 4 Jenny quiz1 A #> 5 Jenny quiz2 A #> 6 Jenny test1 B #> 7 Lionel quiz1 B #> 8 Lionel quiz2 C #> 9 Lionel test1 B #> 10 Suzy quiz1 F #> # ℹ 2 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"tidy-data","dir":"Articles","previous_headings":"Defining tidy data","what":"Tidy data","title":"Tidy data","text":"Tidy data standard way mapping meaning dataset structure. dataset messy tidy depending rows, columns tables matched observations, variables types. tidy data: variable column; column variable. observation row; row observation. value cell; cell single value. Codd’s 3rd normal form, constraints framed statistical language, focus put single dataset rather many connected datasets common relational databases. Messy data arrangement data. Tidy data makes easy analyst computer extract needed variables provides standard way structuring dataset. Compare different versions classroom data: messy version need use different strategies extract different variables. slows analysis invites errors. consider many data analysis operations involve values variable (every aggregation function), can see important extract values simple, standard way. Tidy data particularly well suited vectorised programming languages like R, layout ensures values different variables observation always paired. order variables observations affect analysis, good ordering makes easier scan raw values. One way organising variables role analysis: values fixed design data collection, measured course experiment? Fixed variables describe experimental design known advance. Computer scientists often call fixed variables dimensions, statisticians usually denote subscripts random variables. Measured variables actually measure study. Fixed variables come first, followed measured variables, ordered related variables contiguous. Rows can ordered first variable, breaking ties second subsequent (fixed) variables. convention adopted tabular displays paper.","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"tidying","dir":"Articles","previous_headings":"","what":"Tidying messy datasets","title":"Tidy data","text":"Real datasets can, often , violate three precepts tidy data almost every way imaginable. occasionally get dataset can start analysing immediately, exception, rule. section describes five common problems messy datasets, along remedies: Column headers values, variable names. Multiple variables stored one column. Variables stored rows columns. Multiple types observational units stored table. single observational unit stored multiple tables. Surprisingly, messy datasets, including types messiness explicitly described , can tidied small set tools: pivoting (longer wider) separating. following sections illustrate problem real dataset encountered, show tidy .","code":""},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"column-headers-are-values-not-variable-names","dir":"Articles","previous_headings":"Tidying messy datasets","what":"Column headers are values, not variable names","title":"Tidy data","text":"common type messy dataset tabular data designed presentation, variables form rows columns, column headers values, variable names. call arrangement messy, cases can extremely useful. provides efficient storage completely crossed designs, can lead extremely efficient computation desired operations can expressed matrix operations. following code shows subset typical dataset form. dataset explores relationship income religion US. comes report produced Pew Research Center, American think-tank collects data attitudes topics ranging religion internet, produces many reports contain datasets format. dataset three variables, religion, income frequency. tidy , need pivot non-variable columns two-column key-value pair. action often described making wide dataset longer (taller). pivoting variables, need provide name new key-value columns create. defining columns pivot (every column except religion), need name key column, name variable defined values column headings. case, ’s income. second argument name value column, frequency. form tidy column represents variable row represents observation, case demographic unit corresponding combination religion income. format also used record regularly spaced observations time. example, Billboard dataset shown records date song first entered billboard top 100. variables artist, track, date.entered, rank week. rank week enters top 100 recorded 75 columns, wk1 wk75. form storage tidy, useful data entry. reduces duplication since otherwise song week need row, song metadata like title artist need repeated. discussed depth multiple types. tidy dataset, first use pivot_longer() make dataset longer. transform columns wk1 wk76, making new column names, week, new value values, rank: use values_drop_na = TRUE drop missing values rank column. data, missing values represent weeks song wasn’t charts, can safely dropped. case ’s also nice little cleaning, converting week variable number, figuring date corresponding week charts: Finally, ’s always good idea sort data. artist, track week: date rank:","code":"relig_income #> # A tibble: 18 × 11 #> religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` #> #> 1 Agnostic 27 34 60 81 76 137 #> 2 Atheist 12 27 37 52 35 70 #> 3 Buddhist 27 21 30 34 33 58 #> 4 Catholic 418 617 732 670 638 1116 #> 5 Don’t know/r… 15 14 15 11 10 35 #> 6 Evangelical … 575 869 1064 982 881 1486 #> 7 Hindu 1 9 7 9 11 34 #> 8 Historically… 228 244 236 238 197 223 #> 9 Jehovah's Wi… 20 27 24 24 21 30 #> 10 Jewish 19 19 25 25 30 95 #> # ℹ 8 more rows #> # ℹ 4 more variables: `$75-100k` , `$100-150k` , `>150k` , #> # `Don't know/refused` relig_income %>% pivot_longer(-religion, names_to = \"income\", values_to = \"frequency\") #> # A tibble: 180 × 3 #> religion income frequency #> #> 1 Agnostic <$10k 27 #> 2 Agnostic $10-20k 34 #> 3 Agnostic $20-30k 60 #> 4 Agnostic $30-40k 81 #> 5 Agnostic $40-50k 76 #> 6 Agnostic $50-75k 137 #> 7 Agnostic $75-100k 122 #> 8 Agnostic $100-150k 109 #> 9 Agnostic >150k 84 #> 10 Agnostic Don't know/refused 96 #> # ℹ 170 more rows billboard #> # A tibble: 317 × 79 #> artist track date.entered wk1 wk2 wk3 wk4 wk5 wk6 wk7 #> #> 1 2 Pac Baby… 2000-02-26 87 82 72 77 87 94 99 #> 2 2Ge+her The … 2000-09-02 91 87 92 NA NA NA NA #> 3 3 Doors D… Kryp… 2000-04-08 81 70 68 67 66 57 54 #> 4 3 Doors D… Loser 2000-10-21 76 76 72 69 67 65 55 #> 5 504 Boyz Wobb… 2000-04-15 57 34 25 17 17 31 36 #> 6 98^0 Give… 2000-08-19 51 39 34 26 26 19 2 #> 7 A*Teens Danc… 2000-07-08 97 97 96 95 100 NA NA #> 8 Aaliyah I Do… 2000-01-29 84 62 51 41 38 35 35 #> 9 Aaliyah Try … 2000-03-18 59 53 38 28 21 18 16 #> 10 Adams, Yo… Open… 2000-08-26 76 76 74 69 68 67 61 #> # ℹ 307 more rows #> # ℹ 69 more variables: wk8 , wk9 , wk10 , wk11 , #> # wk12 , wk13 , wk14 , wk15 , wk16 , #> # wk17 , wk18 , wk19 , wk20 , wk21 , #> # wk22 , wk23 , wk24 , wk25 , wk26 , #> # wk27 , wk28 , wk29 , wk30 , wk31 , #> # wk32 , wk33 , wk34 , wk35 , wk36 , … billboard2 <- billboard %>% pivot_longer( wk1:wk76, names_to = \"week\", values_to = \"rank\", values_drop_na = TRUE ) billboard2 #> # A tibble: 5,307 × 5 #> artist track date.entered week rank #> #> 1 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk1 87 #> 2 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk2 82 #> 3 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk3 72 #> 4 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk4 77 #> 5 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk5 87 #> 6 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk6 94 #> 7 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk7 99 #> 8 2Ge+her The Hardest Part Of ... 2000-09-02 wk1 91 #> 9 2Ge+her The Hardest Part Of ... 2000-09-02 wk2 87 #> 10 2Ge+her The Hardest Part Of ... 2000-09-02 wk3 92 #> # ℹ 5,297 more rows billboard3 <- billboard2 %>% mutate( week = as.integer(gsub(\"wk\", \"\", week)), date = as.Date(date.entered) + 7 * (week - 1), date.entered = NULL ) billboard3 #> # A tibble: 5,307 × 5 #> artist track week rank date #> #> 1 2 Pac Baby Don't Cry (Keep... 1 87 2000-02-26 #> 2 2 Pac Baby Don't Cry (Keep... 2 82 2000-03-04 #> 3 2 Pac Baby Don't Cry (Keep... 3 72 2000-03-11 #> 4 2 Pac Baby Don't Cry (Keep... 4 77 2000-03-18 #> 5 2 Pac Baby Don't Cry (Keep... 5 87 2000-03-25 #> 6 2 Pac Baby Don't Cry (Keep... 6 94 2000-04-01 #> 7 2 Pac Baby Don't Cry (Keep... 7 99 2000-04-08 #> 8 2Ge+her The Hardest Part Of ... 1 91 2000-09-02 #> 9 2Ge+her The Hardest Part Of ... 2 87 2000-09-09 #> 10 2Ge+her The Hardest Part Of ... 3 92 2000-09-16 #> # ℹ 5,297 more rows billboard3 %>% arrange(artist, track, week) #> # A tibble: 5,307 × 5 #> artist track week rank date #> #> 1 2 Pac Baby Don't Cry (Keep... 1 87 2000-02-26 #> 2 2 Pac Baby Don't Cry (Keep... 2 82 2000-03-04 #> 3 2 Pac Baby Don't Cry (Keep... 3 72 2000-03-11 #> 4 2 Pac Baby Don't Cry (Keep... 4 77 2000-03-18 #> 5 2 Pac Baby Don't Cry (Keep... 5 87 2000-03-25 #> 6 2 Pac Baby Don't Cry (Keep... 6 94 2000-04-01 #> 7 2 Pac Baby Don't Cry (Keep... 7 99 2000-04-08 #> 8 2Ge+her The Hardest Part Of ... 1 91 2000-09-02 #> 9 2Ge+her The Hardest Part Of ... 2 87 2000-09-09 #> 10 2Ge+her The Hardest Part Of ... 3 92 2000-09-16 #> # ℹ 5,297 more rows billboard3 %>% arrange(date, rank) #> # A tibble: 5,307 × 5 #> artist track week rank date #> #> 1 Lonestar Amazed 1 81 1999-06-05 #> 2 Lonestar Amazed 2 54 1999-06-12 #> 3 Lonestar Amazed 3 44 1999-06-19 #> 4 Lonestar Amazed 4 39 1999-06-26 #> 5 Lonestar Amazed 5 38 1999-07-03 #> 6 Lonestar Amazed 6 33 1999-07-10 #> 7 Lonestar Amazed 7 29 1999-07-17 #> 8 Amber Sexual 1 99 1999-07-17 #> 9 Lonestar Amazed 8 29 1999-07-24 #> 10 Amber Sexual 2 99 1999-07-24 #> # ℹ 5,297 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"multiple-variables-stored-in-one-column","dir":"Articles","previous_headings":"Tidying messy datasets","what":"Multiple variables stored in one column","title":"Tidy data","text":"pivoting columns, key column sometimes combination multiple underlying variable names. happens tb (tuberculosis) dataset, shown . dataset comes World Health Organisation, records counts confirmed tuberculosis cases country, year, demographic group. demographic groups broken sex (m, f) age (0-14, 15-25, 25-34, 35-44, 45-54, 55-64, unknown). First use pivot_longer() gather non-variable columns: Column headers format often separated non-alphanumeric character (e.g. ., -, _, :), fixed width format, like dataset. separate() makes easy split compound variables individual variables. can either pass regular expression split (default split non-alphanumeric columns), vector character positions. case want split first character: Storing values form resolves problem original data. want compare rates, counts, means need know population. original format, easy way add population variable. stored separate table, makes hard correctly match populations counts. tidy form, adding variables population rate easy ’re just additional columns. case, also transformation single step supplying multiple column names names_to also supplying grouped regular expression names_pattern:","code":"tb <- as_tibble(read.csv(\"tb.csv\", stringsAsFactors = FALSE)) tb #> # A tibble: 5,769 × 22 #> iso2 year m04 m514 m014 m1524 m2534 m3544 m4554 m5564 m65 mu #> #> 1 AD 1989 NA NA NA NA NA NA NA NA NA NA #> 2 AD 1990 NA NA NA NA NA NA NA NA NA NA #> 3 AD 1991 NA NA NA NA NA NA NA NA NA NA #> 4 AD 1992 NA NA NA NA NA NA NA NA NA NA #> 5 AD 1993 NA NA NA NA NA NA NA NA NA NA #> 6 AD 1994 NA NA NA NA NA NA NA NA NA NA #> 7 AD 1996 NA NA 0 0 0 4 1 0 0 NA #> 8 AD 1997 NA NA 0 0 1 2 2 1 6 NA #> 9 AD 1998 NA NA 0 0 0 1 0 0 0 NA #> 10 AD 1999 NA NA 0 0 0 1 1 0 0 NA #> # ℹ 5,759 more rows #> # ℹ 10 more variables: f04 , f514 , f014 , f1524 , #> # f2534 , f3544 , f4554 , f5564 , f65 , #> # fu tb2 <- tb %>% pivot_longer( !c(iso2, year), names_to = \"demo\", values_to = \"n\", values_drop_na = TRUE ) tb2 #> # A tibble: 35,750 × 4 #> iso2 year demo n #> #> 1 AD 1996 m014 0 #> 2 AD 1996 m1524 0 #> 3 AD 1996 m2534 0 #> 4 AD 1996 m3544 4 #> 5 AD 1996 m4554 1 #> 6 AD 1996 m5564 0 #> 7 AD 1996 m65 0 #> 8 AD 1996 f014 0 #> 9 AD 1996 f1524 1 #> 10 AD 1996 f2534 1 #> # ℹ 35,740 more rows tb3 <- tb2 %>% separate(demo, c(\"sex\", \"age\"), 1) tb3 #> # A tibble: 35,750 × 5 #> iso2 year sex age n #> #> 1 AD 1996 m 014 0 #> 2 AD 1996 m 1524 0 #> 3 AD 1996 m 2534 0 #> 4 AD 1996 m 3544 4 #> 5 AD 1996 m 4554 1 #> 6 AD 1996 m 5564 0 #> 7 AD 1996 m 65 0 #> 8 AD 1996 f 014 0 #> 9 AD 1996 f 1524 1 #> 10 AD 1996 f 2534 1 #> # ℹ 35,740 more rows tb %>% pivot_longer( !c(iso2, year), names_to = c(\"sex\", \"age\"), names_pattern = \"(.)(.+)\", values_to = \"n\", values_drop_na = TRUE ) #> # A tibble: 35,750 × 5 #> iso2 year sex age n #> #> 1 AD 1996 m 014 0 #> 2 AD 1996 m 1524 0 #> 3 AD 1996 m 2534 0 #> 4 AD 1996 m 3544 4 #> 5 AD 1996 m 4554 1 #> 6 AD 1996 m 5564 0 #> 7 AD 1996 m 65 0 #> 8 AD 1996 f 014 0 #> 9 AD 1996 f 1524 1 #> 10 AD 1996 f 2534 1 #> # ℹ 35,740 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"variables-are-stored-in-both-rows-and-columns","dir":"Articles","previous_headings":"Tidying messy datasets","what":"Variables are stored in both rows and columns","title":"Tidy data","text":"complicated form messy data occurs variables stored rows columns. code loads daily weather data Global Historical Climatology Network one weather station (MX17004) Mexico five months 2010. variables individual columns (id, year, month), spread across columns (day, d1-d31) across rows (tmin, tmax) (minimum maximum temperature). Months fewer 31 days structural missing values last day(s) month. tidy dataset first use pivot_longer gather day columns: presentation, ’ve dropped missing values, making implicit rather explicit. ok know many days month can easily reconstruct explicit missing values. ’ll also little cleaning: dataset mostly tidy, element column variable; stores names variables. (shown example meteorological variables prcp (precipitation) snow (snowfall)). Fixing requires widening data: pivot_wider() inverse pivot_longer(), pivoting element value back across multiple columns: form tidy: ’s one variable column, row represents one day.","code":"weather <- as_tibble(read.csv(\"weather.csv\", stringsAsFactors = FALSE)) weather #> # A tibble: 22 × 35 #> id year month element d1 d2 d3 d4 d5 d6 d7 #> #> 1 MX17004 2010 1 tmax NA NA NA NA NA NA NA #> 2 MX17004 2010 1 tmin NA NA NA NA NA NA NA #> 3 MX17004 2010 2 tmax NA 27.3 24.1 NA NA NA NA #> 4 MX17004 2010 2 tmin NA 14.4 14.4 NA NA NA NA #> 5 MX17004 2010 3 tmax NA NA NA NA 32.1 NA NA #> 6 MX17004 2010 3 tmin NA NA NA NA 14.2 NA NA #> 7 MX17004 2010 4 tmax NA NA NA NA NA NA NA #> 8 MX17004 2010 4 tmin NA NA NA NA NA NA NA #> 9 MX17004 2010 5 tmax NA NA NA NA NA NA NA #> 10 MX17004 2010 5 tmin NA NA NA NA NA NA NA #> # ℹ 12 more rows #> # ℹ 24 more variables: d8 , d9 , d10 , d11 , #> # d12 , d13 , d14 , d15 , d16 , d17 , #> # d18 , d19 , d20 , d21 , d22 , d23 , #> # d24 , d25 , d26 , d27 , d28 , d29 , #> # d30 , d31 weather2 <- weather %>% pivot_longer( d1:d31, names_to = \"day\", values_to = \"value\", values_drop_na = TRUE ) weather2 #> # A tibble: 66 × 6 #> id year month element day value #> #> 1 MX17004 2010 1 tmax d30 27.8 #> 2 MX17004 2010 1 tmin d30 14.5 #> 3 MX17004 2010 2 tmax d2 27.3 #> 4 MX17004 2010 2 tmax d3 24.1 #> 5 MX17004 2010 2 tmax d11 29.7 #> 6 MX17004 2010 2 tmax d23 29.9 #> 7 MX17004 2010 2 tmin d2 14.4 #> 8 MX17004 2010 2 tmin d3 14.4 #> 9 MX17004 2010 2 tmin d11 13.4 #> 10 MX17004 2010 2 tmin d23 10.7 #> # ℹ 56 more rows weather3 <- weather2 %>% mutate(day = as.integer(gsub(\"d\", \"\", day))) %>% select(id, year, month, day, element, value) weather3 #> # A tibble: 66 × 6 #> id year month day element value #> #> 1 MX17004 2010 1 30 tmax 27.8 #> 2 MX17004 2010 1 30 tmin 14.5 #> 3 MX17004 2010 2 2 tmax 27.3 #> 4 MX17004 2010 2 3 tmax 24.1 #> 5 MX17004 2010 2 11 tmax 29.7 #> 6 MX17004 2010 2 23 tmax 29.9 #> 7 MX17004 2010 2 2 tmin 14.4 #> 8 MX17004 2010 2 3 tmin 14.4 #> 9 MX17004 2010 2 11 tmin 13.4 #> 10 MX17004 2010 2 23 tmin 10.7 #> # ℹ 56 more rows weather3 %>% pivot_wider(names_from = element, values_from = value) #> # A tibble: 33 × 6 #> id year month day tmax tmin #> #> 1 MX17004 2010 1 30 27.8 14.5 #> 2 MX17004 2010 2 2 27.3 14.4 #> 3 MX17004 2010 2 3 24.1 14.4 #> 4 MX17004 2010 2 11 29.7 13.4 #> 5 MX17004 2010 2 23 29.9 10.7 #> 6 MX17004 2010 3 5 32.1 14.2 #> 7 MX17004 2010 3 10 34.5 16.8 #> 8 MX17004 2010 3 16 31.1 17.6 #> 9 MX17004 2010 4 27 36.3 16.7 #> 10 MX17004 2010 5 27 33.2 18.2 #> # ℹ 23 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"multiple-types","dir":"Articles","previous_headings":"Tidying messy datasets","what":"Multiple types in one table","title":"Tidy data","text":"Datasets often involve values collected multiple levels, different types observational units. tidying, type observational unit stored table. closely related idea database normalisation, fact expressed one place. ’s important otherwise inconsistencies can arise. billboard dataset actually contains observations two types observational units: song rank week. manifests duplication facts song: artist repeated many times. dataset needs broken two pieces: song dataset stores artist song name, ranking dataset gives rank song week. first extract song dataset: use make rank dataset replacing repeated song facts pointer song details (unique song id): also imagine week dataset record background information week, maybe total number songs sold similar “demographic” information. Normalisation useful tidying eliminating inconsistencies. However, data analysis tools work directly relational data, analysis usually also requires denormalisation merging datasets back one table.","code":"song <- billboard3 %>% distinct(artist, track) %>% mutate(song_id = row_number()) song #> # A tibble: 317 × 3 #> artist track song_id #> #> 1 2 Pac Baby Don't Cry (Keep... 1 #> 2 2Ge+her The Hardest Part Of ... 2 #> 3 3 Doors Down Kryptonite 3 #> 4 3 Doors Down Loser 4 #> 5 504 Boyz Wobble Wobble 5 #> 6 98^0 Give Me Just One Nig... 6 #> 7 A*Teens Dancing Queen 7 #> 8 Aaliyah I Don't Wanna 8 #> 9 Aaliyah Try Again 9 #> 10 Adams, Yolanda Open My Heart 10 #> # ℹ 307 more rows rank <- billboard3 %>% left_join(song, c(\"artist\", \"track\")) %>% select(song_id, date, week, rank) rank #> # A tibble: 5,307 × 4 #> song_id date week rank #> #> 1 1 2000-02-26 1 87 #> 2 1 2000-03-04 2 82 #> 3 1 2000-03-11 3 72 #> 4 1 2000-03-18 4 77 #> 5 1 2000-03-25 5 87 #> 6 1 2000-04-01 6 94 #> 7 1 2000-04-08 7 99 #> 8 2 2000-09-02 1 91 #> 9 2 2000-09-09 2 87 #> 10 2 2000-09-16 3 92 #> # ℹ 5,297 more rows"},{"path":"https://tidyr.tidyverse.org/dev/articles/tidy-data.html","id":"one-type-in-multiple-tables","dir":"Articles","previous_headings":"Tidying messy datasets","what":"One type in multiple tables","title":"Tidy data","text":"’s also common find data values single type observational unit spread multiple tables files. tables files often split another variable, represents single year, person, location. long format individual records consistent, easy problem fix: Read files list tables. table, add new column records original file name (file name often value important variable). Combine tables single table. Purrr makes straightforward R. following code generates vector file names directory (data/) match regular expression (ends .csv). Next name element vector name file. preserve names following step, ensuring row final data frame labeled source. Finally, map_dfr() loops path, reading csv file combining results single data frame. single table, can perform additional tidying needed. example type cleaning can found https://github.com/hadley/data-baby-names takes 129 yearly baby name tables provided US Social Security Administration combines single file. complicated situation occurs dataset structure changes time. example, datasets may contain different variables, variables different names, different file formats, different conventions missing values. may require tidy file individually (, ’re lucky, small groups) combine tidied. example type tidying illustrated https://github.com/hadley/data-fuel-economy, shows tidying epa fuel economy data 50,000 cars 1978 2008. raw data available online, year stored separate file four major formats many minor variations, making tidying dataset considerable challenge.","code":"library(purrr) paths <- dir(\"data\", pattern = \"\\\\.csv$\", full.names = TRUE) names(paths) <- basename(paths) map_dfr(paths, read.csv, stringsAsFactors = FALSE, .id = \"filename\")"},{"path":"https://tidyr.tidyverse.org/dev/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Hadley Wickham. Author, maintainer. Davis Vaughan. Author. Maximilian Girlich. Author. Kevin Ushey. Contributor. . Copyright holder, funder.","code":""},{"path":"https://tidyr.tidyverse.org/dev/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Wickham H, Vaughan D, Girlich M (2024). tidyr: Tidy Messy Data. R package version 1.3.1.9000, https://github.com/tidyverse/tidyr, https://tidyr.tidyverse.org.","code":"@Manual{, title = {tidyr: Tidy Messy Data}, author = {Hadley Wickham and Davis Vaughan and Maximilian Girlich}, year = {2024}, note = {R package version 1.3.1.9000, https://github.com/tidyverse/tidyr}, url = {https://tidyr.tidyverse.org}, }"},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/index.html","id":"overview","dir":"","previous_headings":"","what":"Overview","title":"Tidy Messy Data","text":"goal tidyr help create tidy data. Tidy data data : variable column; column variable. observation row; row observation. value cell; cell single value. Tidy data describes standard way storing data used wherever possible throughout tidyverse. ensure data tidy, ’ll spend less time fighting tools time working analysis. Learn tidy data vignette(\"tidy-data\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Tidy Messy Data","text":"","code":"# The easiest way to get tidyr is to install the whole tidyverse: install.packages(\"tidyverse\") # Alternatively, install just tidyr: install.packages(\"tidyr\") # Or the development version from GitHub: # install.packages(\"pak\") pak::pak(\"tidyverse/tidyr\")"},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/index.html","id":"getting-started","dir":"","previous_headings":"","what":"Getting started","title":"Tidy Messy Data","text":"tidyr functions fall five main categories: “Pivoting” converts long wide forms. tidyr 1.0.0 introduces pivot_longer() pivot_wider(), replacing older spread() gather() functions. See vignette(\"pivot\") details. “Rectangling”, turns deeply nested lists (JSON) tidy tibbles. See unnest_longer(), unnest_wider(), hoist(), vignette(\"rectangle\") details. Nesting converts grouped data form group becomes single row containing nested data frame, unnesting opposite. See nest(), unnest(), vignette(\"nest\") details. Splitting combining character columns. Use separate_wider_delim(), separate_wider_position(), separate_wider_regex() pull single character column multiple columns; use unite() combine multiple columns single character column. Make implicit missing values explicit complete(); make explicit missing values implicit drop_na(); replace missing values next/previous value fill(), known value replace_na().","code":"library(tidyr)"},{"path":"https://tidyr.tidyverse.org/dev/index.html","id":"related-work","dir":"","previous_headings":"","what":"Related work","title":"Tidy Messy Data","text":"tidyr supersedes reshape2 (2010-2014) reshape (2005-2010). Somewhat counterintuitively, iteration package done less. tidyr designed specifically tidying data, general reshaping (reshape2), general aggregation (reshape). data.table provides high-performance implementations melt() dcast() ’d like read data reshaping CS perspective, ’d recommend following three papers: Wrangler: Interactive visual specification data transformation scripts interactive framework data cleaning (Potter’s wheel) efficiently implementing SchemaSQL SQL database system guide reading, ’s translation terminology used different places:","code":""},{"path":"https://tidyr.tidyverse.org/dev/index.html","id":"getting-help","dir":"","previous_headings":"","what":"Getting help","title":"Tidy Messy Data","text":"encounter clear bug, please file minimal reproducible example github. questions discussion, please use forum.posit.co. Please note tidyr project released Contributor Code Conduct. contributing project, agree abide terms.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/billboard.html","id":null,"dir":"Reference","previous_headings":"","what":"Song rankings for Billboard top 100 in the year 2000 — billboard","title":"Song rankings for Billboard top 100 in the year 2000 — billboard","text":"Song rankings Billboard top 100 year 2000","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/billboard.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Song rankings for Billboard top 100 in the year 2000 — billboard","text":"","code":"billboard"},{"path":"https://tidyr.tidyverse.org/dev/reference/billboard.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Song rankings for Billboard top 100 in the year 2000 — billboard","text":"dataset variables: artist Artist name track Song name date.enter Date song entered top 100 wk1 – wk76 Rank song week entered","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/billboard.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Song rankings for Billboard top 100 in the year 2000 — billboard","text":"\"Whitburn\" project, https://waxy.org/2008/05/the_whitburn_project/, (downloaded April 2008)","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/check_pivot_spec.html","id":null,"dir":"Reference","previous_headings":"","what":"Check assumptions about a pivot spec — check_pivot_spec","title":"Check assumptions about a pivot spec — check_pivot_spec","text":"check_pivot_spec() developer facing helper function validating pivot spec used pivot_longer_spec() pivot_wider_spec(). useful extending pivot_longer() pivot_wider() new S3 methods. check_pivot_spec() makes following assertions: spec must data frame. spec must character column named .name. spec must character column named .value. .name column must unique. .name .value columns must first two columns data frame, reordered true.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/check_pivot_spec.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check assumptions about a pivot spec — check_pivot_spec","text":"","code":"check_pivot_spec(spec, call = caller_env())"},{"path":"https://tidyr.tidyverse.org/dev/reference/check_pivot_spec.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check assumptions about a pivot spec — check_pivot_spec","text":"spec specification data frame. useful complex pivots gives greater control metadata stored columns become column names result. Must data frame containing character .name .value columns. Additional columns spec named match columns long format dataset contain values corresponding columns pivoted wide format. special .seq variable used disambiguate rows internally; automatically removed pivoting.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/check_pivot_spec.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Check assumptions about a pivot spec — check_pivot_spec","text":"","code":"# A valid spec spec <- tibble(.name = \"a\", .value = \"b\", foo = 1) check_pivot_spec(spec) #> # A tibble: 1 × 3 #> .name .value foo #> #> 1 a b 1 spec <- tibble(.name = \"a\") try(check_pivot_spec(spec)) #> Error in eval(expr, envir, enclos) : #> `spec` must have `.name` and `.value` columns. # `.name` and `.value` are forced to be the first two columns spec <- tibble(foo = 1, .value = \"b\", .name = \"a\") check_pivot_spec(spec) #> # A tibble: 1 × 3 #> .name .value foo #> #> 1 a b 1"},{"path":"https://tidyr.tidyverse.org/dev/reference/chop.html","id":null,"dir":"Reference","previous_headings":"","what":"Chop and unchop — chop","title":"Chop and unchop — chop","text":"Chopping unchopping preserve width data frame, changing length. chop() makes df shorter converting rows within group list-columns. unchop() makes df longer expanding list-columns element list-column gets row output. chop() unchop() building blocks complicated functions (like unnest(), unnest_longer(), unnest_wider()) generally suitable programming interactive data analysis.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/chop.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Chop and unchop — chop","text":"","code":"chop(data, cols, ..., error_call = current_env()) unchop( data, cols, ..., keep_empty = FALSE, ptype = NULL, error_call = current_env() )"},{"path":"https://tidyr.tidyverse.org/dev/reference/chop.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Chop and unchop — chop","text":"data data frame. cols Columns chop unchop. unchop(), column list-column containing generalised vectors (e.g. mix NULLs, atomic vector, S3 vectors, lists, data frames). ... dots future extensions must empty. error_call execution environment currently running function, e.g. caller_env(). function mentioned error messages source error. See call argument abort() information. keep_empty default, get one row output element list unchopping/unnesting. means size-0 element (like NULL empty data frame vector), entire row dropped output. want preserve rows, use keep_empty = TRUE replace size-0 elements single row missing values. ptype Optionally, named list column name-prototype pairs coerce cols , overriding default guessed combining individual values. Alternatively, single empty ptype can supplied, applied cols.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/chop.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Chop and unchop — chop","text":"Generally, unchopping useful chopping simplifies complex data structure, nest()ing usually appropriate chop()ing since better preserves connections observations. chop() creates list-columns class vctrs::list_of() ensure consistent behaviour chopped data frame emptied. instance helps getting back original column types roundtrip chop unchop. keeps tracks type elements, unchop() able reconstitute correct vector type even empty list-columns.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/chop.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Chop and unchop — chop","text":"","code":"# Chop ---------------------------------------------------------------------- df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1) # Note that we get one row of output for each unique combination of # non-chopped variables df %>% chop(c(y, z)) #> # A tibble: 3 × 3 #> x y z #> > > #> 1 1 [3] [3] #> 2 2 [2] [2] #> 3 3 [1] [1] # cf nest df %>% nest(data = c(y, z)) #> # A tibble: 3 × 2 #> x data #> #> 1 1 #> 2 2 #> 3 3 # Unchop -------------------------------------------------------------------- df <- tibble(x = 1:4, y = list(integer(), 1L, 1:2, 1:3)) df %>% unchop(y) #> # A tibble: 6 × 2 #> x y #> #> 1 2 1 #> 2 3 1 #> 3 3 2 #> 4 4 1 #> 5 4 2 #> 6 4 3 df %>% unchop(y, keep_empty = TRUE) #> # A tibble: 7 × 2 #> x y #> #> 1 1 NA #> 2 2 1 #> 3 3 1 #> 4 3 2 #> 5 4 1 #> 6 4 2 #> 7 4 3 # unchop will error if the types are not compatible: df <- tibble(x = 1:2, y = list(\"1\", 1:3)) try(df %>% unchop(y)) #> Error in list_unchop(col, ptype = col_ptype) : #> Can't combine `x[[1]]` and `x[[2]]` . # Unchopping a list-col of data frames must generate a df-col because # unchop leaves the column names unchanged df <- tibble(x = 1:3, y = list(NULL, tibble(x = 1), tibble(y = 1:2))) df %>% unchop(y) #> # A tibble: 3 × 2 #> x y$x $y #> #> 1 2 1 NA #> 2 3 NA 1 #> 3 3 NA 2 df %>% unchop(y, keep_empty = TRUE) #> # A tibble: 4 × 2 #> x y$x $y #> #> 1 1 NA NA #> 2 2 1 NA #> 3 3 NA 1 #> 4 3 NA 2"},{"path":"https://tidyr.tidyverse.org/dev/reference/cms_patient_experience.html","id":null,"dir":"Reference","previous_headings":"","what":"Data from the Centers for Medicare & Medicaid Services — cms_patient_experience","title":"Data from the Centers for Medicare & Medicaid Services — cms_patient_experience","text":"Two datasets public data provided Centers Medicare & Medicaid Services, https://data.cms.gov. cms_patient_experience contains lightly cleaned data \"Hospice - Provider Data\", provides list hospice agencies along data quality patient care, https://data.cms.gov/provider-data/dataset/252m-zfp9. cms_patient_care \"Doctors Clinicians Quality Payment Program PY 2020 Virtual Group Public Reporting\", https://data.cms.gov/provider-data/dataset/8c70-d353","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/cms_patient_experience.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Data from the Centers for Medicare & Medicaid Services — cms_patient_experience","text":"","code":"cms_patient_experience cms_patient_care"},{"path":"https://tidyr.tidyverse.org/dev/reference/cms_patient_experience.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Data from the Centers for Medicare & Medicaid Services — cms_patient_experience","text":"cms_patient_experience data frame 500 observations five variables: org_pac_id,org_nm Organisation ID name measure_cd,measure_title Measure code title prf_rate Measure performance rate cms_patient_care data frame 252 observations five variables: ccn,facility_name Facility ID name measure_abbr Abbreviated measurement title, suitable use variable name score Measure score type Whether score refers rating 100 (\"observed\"), maximum possible value raw score (\"denominator\")","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/cms_patient_experience.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Data from the Centers for Medicare & Medicaid Services — cms_patient_experience","text":"","code":"cms_patient_experience %>% dplyr::distinct(measure_cd, measure_title) #> # A tibble: 6 × 2 #> measure_cd measure_title #> #> 1 CAHPS_GRP_1 CAHPS for MIPS SSM: Getting Timely Care, Appointments, and… #> 2 CAHPS_GRP_2 CAHPS for MIPS SSM: How Well Providers Communicate #> 3 CAHPS_GRP_3 CAHPS for MIPS SSM: Patient's Rating of Provider #> 4 CAHPS_GRP_5 CAHPS for MIPS SSM: Health Promotion and Education #> 5 CAHPS_GRP_8 CAHPS for MIPS SSM: Courteous and Helpful Office Staff #> 6 CAHPS_GRP_12 CAHPS for MIPS SSM: Stewardship of Patient Resources cms_patient_experience %>% pivot_wider( id_cols = starts_with(\"org\"), names_from = measure_cd, values_from = prf_rate ) #> # A tibble: 95 × 8 #> org_pac_id org_nm CAHPS_GRP_1 CAHPS_GRP_2 CAHPS_GRP_3 CAHPS_GRP_5 #> #> 1 0446157747 USC CARE ME… 63 87 86 57 #> 2 0446162697 ASSOCIATION… 59 85 83 63 #> 3 0547164295 BEAVER MEDI… 49 NA 75 44 #> 4 0749333730 CAPE PHYSIC… 67 84 85 65 #> 5 0840104360 ALLIANCE PH… 66 87 87 64 #> 6 0840109864 REX HOSPITA… 73 87 84 67 #> 7 0840513552 SCL HEALTH … 58 83 76 58 #> 8 0941545784 GRITMAN MED… 46 86 81 54 #> 9 1052612785 COMMUNITY M… 65 84 80 58 #> 10 1254237779 OUR LADY OF… 61 NA NA 65 #> # ℹ 85 more rows #> # ℹ 2 more variables: CAHPS_GRP_8 , CAHPS_GRP_12 cms_patient_care %>% pivot_wider( names_from = type, values_from = score ) #> # A tibble: 126 × 5 #> ccn facility_name measure_abbr denominator observed #> #> 1 011500 BAPTIST HOSPICE beliefs_add… 202 100 #> 2 011500 BAPTIST HOSPICE composite_p… 202 88.1 #> 3 011500 BAPTIST HOSPICE dyspena_tre… 110 99.1 #> 4 011500 BAPTIST HOSPICE dyspnea_scr… 202 100 #> 5 011500 BAPTIST HOSPICE opioid_bowel 61 100 #> 6 011500 BAPTIST HOSPICE pain_assess… 107 100 #> 7 011500 BAPTIST HOSPICE pain_screen… 202 88.6 #> 8 011500 BAPTIST HOSPICE treat_pref 202 100 #> 9 011500 BAPTIST HOSPICE visits_immi… 232 96.1 #> 10 011501 SOUTHERNCARE NEW BEACON N. BI… beliefs_add… 525 100 #> # ℹ 116 more rows cms_patient_care %>% pivot_wider( names_from = measure_abbr, values_from = score ) #> # A tibble: 28 × 12 #> ccn facility_name type beliefs_addressed composite_process #> #> 1 011500 BAPTIST HOSPICE deno… 202 202 #> 2 011500 BAPTIST HOSPICE obse… 100 88.1 #> 3 011501 SOUTHERNCARE NEW BEAC… deno… 525 525 #> 4 011501 SOUTHERNCARE NEW BEAC… obse… 100 100 #> 5 011502 COMFORT CARE COASTAL … deno… 295 295 #> 6 011502 COMFORT CARE COASTAL … obse… 100 99.3 #> 7 011503 SAAD HOSPICE SERVICES deno… 694 694 #> 8 011503 SAAD HOSPICE SERVICES obse… 99.9 96 #> 9 011505 HOSPICE FAMILY CARE deno… 600 600 #> 10 011505 HOSPICE FAMILY CARE obse… 97.8 92 #> # ℹ 18 more rows #> # ℹ 7 more variables: dyspena_treatment , dyspnea_screening , #> # opioid_bowel , pain_assessment , pain_screening , #> # treat_pref , visits_imminent cms_patient_care %>% pivot_wider( names_from = c(measure_abbr, type), values_from = score ) #> # A tibble: 14 × 20 #> ccn facility_name beliefs_addressed_de…¹ beliefs_addressed_ob…² #> #> 1 011500 BAPTIST HOSPICE 202 100 #> 2 011501 SOUTHERNCARE NEW … 525 100 #> 3 011502 COMFORT CARE COAS… 295 100 #> 4 011503 SAAD HOSPICE SERV… 694 99.9 #> 5 011505 HOSPICE FAMILY CA… 600 97.8 #> 6 011506 SOUTHERNCARE NEW … 589 100 #> 7 011508 SOUTHERNCARE NEW … 420 100 #> 8 011510 CULLMAN REGIONAL … 54 100 #> 9 011511 HOSPICE OF THE VA… 179 100 #> 10 011512 SOUTHERNCARE NEW … 396 100 #> 11 011513 SHEPHERD'S COVE H… 335 99.1 #> 12 011514 ST VINCENT'S HOSP… 210 100 #> 13 011516 HOSPICE OF LIMEST… 103 100 #> 14 011517 HOSPICE OF WEST A… 400 99.8 #> # ℹ abbreviated names: ¹​beliefs_addressed_denominator, #> # ²​beliefs_addressed_observed #> # ℹ 16 more variables: composite_process_denominator , #> # composite_process_observed , #> # dyspena_treatment_denominator , #> # dyspena_treatment_observed , #> # dyspnea_screening_denominator , …"},{"path":"https://tidyr.tidyverse.org/dev/reference/complete.html","id":null,"dir":"Reference","previous_headings":"","what":"Complete a data frame with missing combinations of data — complete","title":"Complete a data frame with missing combinations of data — complete","text":"Turns implicit missing values explicit missing values. wrapper around expand(), dplyr::full_join() replace_na() useful completing missing combinations data.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/complete.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Complete a data frame with missing combinations of data — complete","text":"","code":"complete(data, ..., fill = list(), explicit = TRUE)"},{"path":"https://tidyr.tidyverse.org/dev/reference/complete.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Complete a data frame with missing combinations of data — complete","text":"data data frame. ... Specification columns expand complete. Columns can atomic vectors lists. find unique combinations x, y z, including present data, supply variable separate argument: expand(df, x, y, z) complete(df, x, y, z). find combinations occur data, use nesting: expand(df, nesting(x, y, z)). can combine two forms. example, expand(df, nesting(school_id, student_id), date) produce row present school-student combination possible dates. used factors, expand() complete() use full set levels, just appear data. want use values seen data, use forcats::fct_drop(). used continuous variables, may need fill values appear data: use expressions like year = 2010:2020 year = full_seq(year,1). fill named list variable supplies single value use instead NA missing combinations. explicit implicit (newly created) explicit (pre-existing) missing values filled fill? default, TRUE, set FALSE limit fill implicit missing values.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/complete.html","id":"grouped-data-frames","dir":"Reference","previous_headings":"","what":"Grouped data frames","title":"Complete a data frame with missing combinations of data — complete","text":"grouped data frames created dplyr::group_by(), complete() operates within group. , complete grouping column.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/complete.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Complete a data frame with missing combinations of data — complete","text":"","code":"df <- tibble( group = c(1:2, 1, 2), item_id = c(1:2, 2, 3), item_name = c(\"a\", \"a\", \"b\", \"b\"), value1 = c(1, NA, 3, 4), value2 = 4:7 ) df #> # A tibble: 4 × 5 #> group item_id item_name value1 value2 #> #> 1 1 1 a 1 4 #> 2 2 2 a NA 5 #> 3 1 2 b 3 6 #> 4 2 3 b 4 7 # Combinations -------------------------------------------------------------- # Generate all possible combinations of `group`, `item_id`, and `item_name` # (whether or not they appear in the data) df %>% complete(group, item_id, item_name) #> # A tibble: 12 × 5 #> group item_id item_name value1 value2 #> #> 1 1 1 a 1 4 #> 2 1 1 b NA NA #> 3 1 2 a NA NA #> 4 1 2 b 3 6 #> 5 1 3 a NA NA #> 6 1 3 b NA NA #> 7 2 1 a NA NA #> 8 2 1 b NA NA #> 9 2 2 a NA 5 #> 10 2 2 b NA NA #> 11 2 3 a NA NA #> 12 2 3 b 4 7 # Cross all possible `group` values with the unique pairs of # `(item_id, item_name)` that already exist in the data df %>% complete(group, nesting(item_id, item_name)) #> # A tibble: 8 × 5 #> group item_id item_name value1 value2 #> #> 1 1 1 a 1 4 #> 2 1 2 a NA NA #> 3 1 2 b 3 6 #> 4 1 3 b NA NA #> 5 2 1 a NA NA #> 6 2 2 a NA 5 #> 7 2 2 b NA NA #> 8 2 3 b 4 7 # Within each `group`, generate all possible combinations of # `item_id` and `item_name` that occur in that group df %>% dplyr::group_by(group) %>% complete(item_id, item_name) #> # A tibble: 8 × 5 #> # Groups: group [2] #> group item_id item_name value1 value2 #> #> 1 1 1 a 1 4 #> 2 1 1 b NA NA #> 3 1 2 a NA NA #> 4 1 2 b 3 6 #> 5 2 2 a NA 5 #> 6 2 2 b NA NA #> 7 2 3 a NA NA #> 8 2 3 b 4 7 # Supplying values for new rows --------------------------------------------- # Use `fill` to replace NAs with some value. By default, affects both new # (implicit) and pre-existing (explicit) missing values. df %>% complete( group, nesting(item_id, item_name), fill = list(value1 = 0, value2 = 99) ) #> # A tibble: 8 × 5 #> group item_id item_name value1 value2 #> #> 1 1 1 a 1 4 #> 2 1 2 a 0 99 #> 3 1 2 b 3 6 #> 4 1 3 b 0 99 #> 5 2 1 a 0 99 #> 6 2 2 a 0 5 #> 7 2 2 b 0 99 #> 8 2 3 b 4 7 # Limit the fill to only the newly created (i.e. previously implicit) # missing values with `explicit = FALSE` df %>% complete( group, nesting(item_id, item_name), fill = list(value1 = 0, value2 = 99), explicit = FALSE ) #> # A tibble: 8 × 5 #> group item_id item_name value1 value2 #> #> 1 1 1 a 1 4 #> 2 1 2 a 0 99 #> 3 1 2 b 3 6 #> 4 1 3 b 0 99 #> 5 2 1 a 0 99 #> 6 2 2 a NA 5 #> 7 2 2 b 0 99 #> 8 2 3 b 4 7"},{"path":"https://tidyr.tidyverse.org/dev/reference/construction.html","id":null,"dir":"Reference","previous_headings":"","what":"Completed construction in the US in 2018 — construction","title":"Completed construction in the US in 2018 — construction","text":"Completed construction US 2018","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/construction.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Completed construction in the US in 2018 — construction","text":"","code":"construction"},{"path":"https://tidyr.tidyverse.org/dev/reference/construction.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Completed construction in the US in 2018 — construction","text":"dataset variables: Year,Month Record date 1 unit, 2 4 units, 5 units mote Number completed units size Northeast,Midwest,South,West Number completed units region","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/construction.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Completed construction in the US in 2018 — construction","text":"Completions \"New Residential Construction\" found Table 5 https://www.census.gov/construction/nrc/xls/newresconst.xls (downloaded March 2019)","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/deprecated-se.html","id":null,"dir":"Reference","previous_headings":"","what":"Deprecated SE versions of main verbs — deprecated-se","title":"Deprecated SE versions of main verbs — deprecated-se","text":"tidyr used offer twin versions verb suffixed underscore. versions standard evaluation (SE) semantics: rather taking arguments code, like NSE verbs, took arguments value. purpose make possible program tidyr. However, tidyr now uses tidy evaluation semantics. NSE verbs still capture arguments, can now unquote parts arguments. offers full programmability NSE verbs. Thus, underscored versions now superfluous. Unquoting triggers immediate evaluation operand inlines result within captured expression. result can value expression evaluated later rest argument. See vignette(\"programming\", \"dplyr\") information.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/deprecated-se.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Deprecated SE versions of main verbs — deprecated-se","text":"","code":"complete_(data, cols, fill = list(), ...) drop_na_(data, vars) expand_(data, dots, ...) crossing_(x) nesting_(x) extract_( data, col, into, regex = \"([[:alnum:]]+)\", remove = TRUE, convert = FALSE, ... ) fill_(data, fill_cols, .direction = c(\"down\", \"up\")) gather_( data, key_col, value_col, gather_cols, na.rm = FALSE, convert = FALSE, factor_key = FALSE ) nest_(...) separate_rows_(data, cols, sep = \"[^[:alnum:].]+\", convert = FALSE) separate_( data, col, into, sep = \"[^[:alnum:]]+\", remove = TRUE, convert = FALSE, extra = \"warn\", fill = \"warn\", ... ) spread_( data, key_col, value_col, fill = NA, convert = FALSE, drop = TRUE, sep = NULL ) unite_(data, col, from, sep = \"_\", remove = TRUE) unnest_(...)"},{"path":"https://tidyr.tidyverse.org/dev/reference/deprecated-se.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Deprecated SE versions of main verbs — deprecated-se","text":"data data frame fill named list variable supplies single value use instead NA missing combinations. ... Specification columns expand complete. Columns can atomic vectors lists. find unique combinations x, y z, including present data, supply variable separate argument: expand(df, x, y, z) complete(df, x, y, z). find combinations occur data, use nesting: expand(df, nesting(x, y, z)). can combine two forms. example, expand(df, nesting(school_id, student_id), date) produce row present school-student combination possible dates. used factors, expand() complete() use full set levels, just appear data. want use values seen data, use forcats::fct_drop(). used continuous variables, may need fill values appear data: use expressions like year = 2010:2020 year = full_seq(year,1). vars, cols, col Name columns. x nesting_ crossing_ list variables. Names new variables create character vector. Use NA omit variable output. regex string representing regular expression used extract desired values. one group (defined ()) element . remove TRUE, remove input column output data frame. convert TRUE, run type.convert() .= TRUE new columns. useful component columns integer, numeric logical. NB: cause string \"NA\"s converted NAs. fill_cols Character vector column names. .direction Direction fill missing values. Currently either \"\" (default), \"\", \"downup\" (.e. first ) \"updown\" (first ). key_col, value_col Strings giving names key value cols. gather_cols Character vector giving column names gathered pair key-value columns. na.rm TRUE, remove rows output value column NA. factor_key FALSE, default, key values stored character vector. TRUE, stored factor, preserves original ordering columns. sep Separator delimiting collapsed values. extra sep character vector, controls happens many pieces. three valid options: \"warn\" (default): emit warning drop extra values. \"drop\": drop extra values without warning. \"merge\": splits length() times drop FALSE, keep factor levels appear data, filling missing combinations fill. Names existing columns character vector","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/drop_na.html","id":null,"dir":"Reference","previous_headings":"","what":"Drop rows containing missing values — drop_na","title":"Drop rows containing missing values — drop_na","text":"drop_na() drops rows column specified ... contains missing value.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/drop_na.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Drop rows containing missing values — drop_na","text":"","code":"drop_na(data, ...)"},{"path":"https://tidyr.tidyverse.org/dev/reference/drop_na.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Drop rows containing missing values — drop_na","text":"data data frame. ... Columns inspect missing values. empty, columns used.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/drop_na.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Drop rows containing missing values — drop_na","text":"Another way interpret drop_na() keeps \"complete\" rows (rows contain missing values). Internally, completeness computed vctrs::vec_detect_complete().","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/drop_na.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Drop rows containing missing values — drop_na","text":"","code":"df <- tibble(x = c(1, 2, NA), y = c(\"a\", NA, \"b\")) df %>% drop_na() #> # A tibble: 1 × 2 #> x y #> #> 1 1 a df %>% drop_na(x) #> # A tibble: 2 × 2 #> x y #> #> 1 1 a #> 2 2 NA vars <- \"y\" df %>% drop_na(x, any_of(vars)) #> # A tibble: 1 × 2 #> x y #> #> 1 1 a"},{"path":"https://tidyr.tidyverse.org/dev/reference/expand.html","id":null,"dir":"Reference","previous_headings":"","what":"Expand data frame to include all possible combinations of values — expand","title":"Expand data frame to include all possible combinations of values — expand","text":"expand() generates combination variables found dataset. paired nesting() crossing() helpers. crossing() wrapper around expand_grid() de-duplicates sorts inputs; nesting() helper finds combinations already present data. expand() often useful conjunction joins: use right_join() convert implicit missing values explicit missing values (e.g., fill gaps data frame). use anti_join() figure combinations missing (e.g., identify gaps data frame).","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/expand.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Expand data frame to include all possible combinations of values — expand","text":"","code":"expand(data, ..., .name_repair = \"check_unique\") crossing(..., .name_repair = \"check_unique\") nesting(..., .name_repair = \"check_unique\")"},{"path":"https://tidyr.tidyverse.org/dev/reference/expand.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Expand data frame to include all possible combinations of values — expand","text":"data data frame. ... Specification columns expand complete. Columns can atomic vectors lists. find unique combinations x, y z, including present data, supply variable separate argument: expand(df, x, y, z) complete(df, x, y, z). find combinations occur data, use nesting: expand(df, nesting(x, y, z)). can combine two forms. example, expand(df, nesting(school_id, student_id), date) produce row present school-student combination possible dates. used factors, expand() complete() use full set levels, just appear data. want use values seen data, use forcats::fct_drop(). used continuous variables, may need fill values appear data: use expressions like year = 2010:2020 year = full_seq(year,1). .name_repair Treatment problematic column names: \"minimal\": name repair checks, beyond basic existence, \"unique\": Make sure names unique empty, \"check_unique\": (default value), name repair, check unique, \"universal\": Make names unique syntactic function: apply custom name repair (e.g., .name_repair = make.names names style base R). purrr-style anonymous function, see rlang::as_function() argument passed repair vctrs::vec_as_names(). See details terms strategies used enforce .","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/expand.html","id":"grouped-data-frames","dir":"Reference","previous_headings":"","what":"Grouped data frames","title":"Expand data frame to include all possible combinations of values — expand","text":"grouped data frames created dplyr::group_by(), expand() operates within group. , expand grouping column.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/expand.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Expand data frame to include all possible combinations of values — expand","text":"","code":"# Finding combinations ------------------------------------------------------ fruits <- tibble( type = c(\"apple\", \"orange\", \"apple\", \"orange\", \"orange\", \"orange\"), year = c(2010, 2010, 2012, 2010, 2011, 2012), size = factor( c(\"XS\", \"S\", \"M\", \"S\", \"S\", \"M\"), levels = c(\"XS\", \"S\", \"M\", \"L\") ), weights = rnorm(6, as.numeric(size) + 2) ) # All combinations, including factor levels that are not used fruits %>% expand(type) #> # A tibble: 2 × 1 #> type #> #> 1 apple #> 2 orange fruits %>% expand(size) #> # A tibble: 4 × 1 #> size #> #> 1 XS #> 2 S #> 3 M #> 4 L fruits %>% expand(type, size) #> # A tibble: 8 × 2 #> type size #> #> 1 apple XS #> 2 apple S #> 3 apple M #> 4 apple L #> 5 orange XS #> 6 orange S #> 7 orange M #> 8 orange L fruits %>% expand(type, size, year) #> # A tibble: 24 × 3 #> type size year #> #> 1 apple XS 2010 #> 2 apple XS 2011 #> 3 apple XS 2012 #> 4 apple S 2010 #> 5 apple S 2011 #> 6 apple S 2012 #> 7 apple M 2010 #> 8 apple M 2011 #> 9 apple M 2012 #> 10 apple L 2010 #> # ℹ 14 more rows # Only combinations that already appear in the data fruits %>% expand(nesting(type)) #> # A tibble: 2 × 1 #> type #> #> 1 apple #> 2 orange fruits %>% expand(nesting(size)) #> # A tibble: 3 × 1 #> size #> #> 1 XS #> 2 S #> 3 M fruits %>% expand(nesting(type, size)) #> # A tibble: 4 × 2 #> type size #> #> 1 apple XS #> 2 apple M #> 3 orange S #> 4 orange M fruits %>% expand(nesting(type, size, year)) #> # A tibble: 5 × 3 #> type size year #> #> 1 apple XS 2010 #> 2 apple M 2012 #> 3 orange S 2010 #> 4 orange S 2011 #> 5 orange M 2012 # Other uses ---------------------------------------------------------------- # Use with `full_seq()` to fill in values of continuous variables fruits %>% expand(type, size, full_seq(year, 1)) #> # A tibble: 24 × 3 #> type size `full_seq(year, 1)` #> #> 1 apple XS 2010 #> 2 apple XS 2011 #> 3 apple XS 2012 #> 4 apple S 2010 #> 5 apple S 2011 #> 6 apple S 2012 #> 7 apple M 2010 #> 8 apple M 2011 #> 9 apple M 2012 #> 10 apple L 2010 #> # ℹ 14 more rows fruits %>% expand(type, size, 2010:2013) #> # A tibble: 32 × 3 #> type size `2010:2013` #> #> 1 apple XS 2010 #> 2 apple XS 2011 #> 3 apple XS 2012 #> 4 apple XS 2013 #> 5 apple S 2010 #> 6 apple S 2011 #> 7 apple S 2012 #> 8 apple S 2013 #> 9 apple M 2010 #> 10 apple M 2011 #> # ℹ 22 more rows # Use `anti_join()` to determine which observations are missing all <- fruits %>% expand(type, size, year) all #> # A tibble: 24 × 3 #> type size year #> #> 1 apple XS 2010 #> 2 apple XS 2011 #> 3 apple XS 2012 #> 4 apple S 2010 #> 5 apple S 2011 #> 6 apple S 2012 #> 7 apple M 2010 #> 8 apple M 2011 #> 9 apple M 2012 #> 10 apple L 2010 #> # ℹ 14 more rows all %>% dplyr::anti_join(fruits) #> Joining with `by = join_by(type, size, year)` #> # A tibble: 19 × 3 #> type size year #> #> 1 apple XS 2011 #> 2 apple XS 2012 #> 3 apple S 2010 #> 4 apple S 2011 #> 5 apple S 2012 #> 6 apple M 2010 #> 7 apple M 2011 #> 8 apple L 2010 #> 9 apple L 2011 #> 10 apple L 2012 #> 11 orange XS 2010 #> 12 orange XS 2011 #> 13 orange XS 2012 #> 14 orange S 2012 #> 15 orange M 2010 #> 16 orange M 2011 #> 17 orange L 2010 #> 18 orange L 2011 #> 19 orange L 2012 # Use with `right_join()` to fill in missing rows (like `complete()`) fruits %>% dplyr::right_join(all) #> Joining with `by = join_by(type, year, size)` #> # A tibble: 25 × 4 #> type year size weights #> #> 1 apple 2010 XS 1.60 #> 2 orange 2010 S 4.26 #> 3 apple 2012 M 2.56 #> 4 orange 2010 S 3.99 #> 5 orange 2011 S 4.62 #> 6 orange 2012 M 6.15 #> 7 apple 2011 XS NA #> 8 apple 2012 XS NA #> 9 apple 2010 S NA #> 10 apple 2011 S NA #> # ℹ 15 more rows # Use with `group_by()` to expand within each group fruits %>% dplyr::group_by(type) %>% expand(year, size) #> # A tibble: 20 × 3 #> # Groups: type [2] #> type year size #> #> 1 apple 2010 XS #> 2 apple 2010 S #> 3 apple 2010 M #> 4 apple 2010 L #> 5 apple 2012 XS #> 6 apple 2012 S #> 7 apple 2012 M #> 8 apple 2012 L #> 9 orange 2010 XS #> 10 orange 2010 S #> 11 orange 2010 M #> 12 orange 2010 L #> 13 orange 2011 XS #> 14 orange 2011 S #> 15 orange 2011 M #> 16 orange 2011 L #> 17 orange 2012 XS #> 18 orange 2012 S #> 19 orange 2012 M #> 20 orange 2012 L"},{"path":"https://tidyr.tidyverse.org/dev/reference/expand_grid.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a tibble from all combinations of inputs — expand_grid","title":"Create a tibble from all combinations of inputs — expand_grid","text":"expand_grid() heavily motivated expand.grid(). Compared expand.grid(), : Produces sorted output (varying first column slowest, rather fastest). Returns tibble, data frame. Never converts strings factors. add additional attributes. Can expand generalised vector, including data frames.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/expand_grid.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a tibble from all combinations of inputs — expand_grid","text":"","code":"expand_grid(..., .name_repair = \"check_unique\")"},{"path":"https://tidyr.tidyverse.org/dev/reference/expand_grid.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a tibble from all combinations of inputs — expand_grid","text":"... Name-value pairs. name become column name output. .name_repair Treatment problematic column names: \"minimal\": name repair checks, beyond basic existence, \"unique\": Make sure names unique empty, \"check_unique\": (default value), name repair, check unique, \"universal\": Make names unique syntactic function: apply custom name repair (e.g., .name_repair = make.names names style base R). purrr-style anonymous function, see rlang::as_function() argument passed repair vctrs::vec_as_names(). See details terms strategies used enforce .","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/expand_grid.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a tibble from all combinations of inputs — expand_grid","text":"tibble one column input .... output one row combination inputs, .e. size equal product sizes inputs. implies input length 0, output zero rows.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/expand_grid.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create a tibble from all combinations of inputs — expand_grid","text":"","code":"expand_grid(x = 1:3, y = 1:2) #> # A tibble: 6 × 2 #> x y #> #> 1 1 1 #> 2 1 2 #> 3 2 1 #> 4 2 2 #> 5 3 1 #> 6 3 2 expand_grid(l1 = letters, l2 = LETTERS) #> # A tibble: 676 × 2 #> l1 l2 #> #> 1 a A #> 2 a B #> 3 a C #> 4 a D #> 5 a E #> 6 a F #> 7 a G #> 8 a H #> 9 a I #> 10 a J #> # ℹ 666 more rows # Can also expand data frames expand_grid(df = tibble(x = 1:2, y = c(2, 1)), z = 1:3) #> # A tibble: 6 × 2 #> df$x $y z #> #> 1 1 2 1 #> 2 1 2 2 #> 3 1 2 3 #> 4 2 1 1 #> 5 2 1 2 #> 6 2 1 3 # And matrices expand_grid(x1 = matrix(1:4, nrow = 2), x2 = matrix(5:8, nrow = 2)) #> # A tibble: 4 × 2 #> x1[,1] [,2] x2[,1] [,2] #> #> 1 1 3 5 7 #> 2 1 3 6 8 #> 3 2 4 5 7 #> 4 2 4 6 8"},{"path":"https://tidyr.tidyverse.org/dev/reference/extract.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract a character column into multiple columns using regular expression groups — extract","title":"Extract a character column into multiple columns using regular expression groups — extract","text":"extract() superseded favour separate_wider_regex() polished API better handling problems. Superseded functions go away, receive critical bug fixes. Given regular expression capturing groups, extract() turns group new column. groups match, input NA, output NA.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/extract.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract a character column into multiple columns using regular expression groups — extract","text":"","code":"extract( data, col, into, regex = \"([[:alnum:]]+)\", remove = TRUE, convert = FALSE, ... )"},{"path":"https://tidyr.tidyverse.org/dev/reference/extract.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract a character column into multiple columns using regular expression groups — extract","text":"data data frame. col Column expand. Names new variables create character vector. Use NA omit variable output. regex string representing regular expression used extract desired values. one group (defined ()) element . remove TRUE, remove input column output data frame. convert TRUE, run type.convert() .= TRUE new columns. useful component columns integer, numeric logical. NB: cause string \"NA\"s converted NAs. ... Additional arguments passed methods.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/extract.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Extract a character column into multiple columns using regular expression groups — extract","text":"","code":"df <- tibble(x = c(NA, \"a-b\", \"a-d\", \"b-c\", \"d-e\")) df %>% extract(x, \"A\") #> # A tibble: 5 × 1 #> A #> #> 1 NA #> 2 a #> 3 a #> 4 b #> 5 d df %>% extract(x, c(\"A\", \"B\"), \"([[:alnum:]]+)-([[:alnum:]]+)\") #> # A tibble: 5 × 2 #> A B #> #> 1 NA NA #> 2 a b #> 3 a d #> 4 b c #> 5 d e # Now recommended df %>% separate_wider_regex( x, patterns = c(A = \"[[:alnum:]]+\", \"-\", B = \"[[:alnum:]]+\") ) #> # A tibble: 5 × 2 #> A B #> #> 1 NA NA #> 2 a b #> 3 a d #> 4 b c #> 5 d e # If no match, NA: df %>% extract(x, c(\"A\", \"B\"), \"([a-d]+)-([a-d]+)\") #> # A tibble: 5 × 2 #> A B #> #> 1 NA NA #> 2 a b #> 3 a d #> 4 b c #> 5 NA NA"},{"path":"https://tidyr.tidyverse.org/dev/reference/extract_numeric.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract numeric component of variable. — extract_numeric","title":"Extract numeric component of variable. — extract_numeric","text":"DEPRECATED: please use readr::parse_number() instead.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/extract_numeric.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract numeric component of variable. — extract_numeric","text":"","code":"extract_numeric(x)"},{"path":"https://tidyr.tidyverse.org/dev/reference/extract_numeric.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract numeric component of variable. — extract_numeric","text":"x character vector (factor).","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/fill.html","id":null,"dir":"Reference","previous_headings":"","what":"Fill in missing values with previous or next value — fill","title":"Fill in missing values with previous or next value — fill","text":"Fills missing values selected columns using next previous entry. useful common output format values repeated, recorded change.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/fill.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fill in missing values with previous or next value — fill","text":"","code":"fill(data, ..., .direction = c(\"down\", \"up\", \"downup\", \"updown\"))"},{"path":"https://tidyr.tidyverse.org/dev/reference/fill.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fill in missing values with previous or next value — fill","text":"data data frame. ... Columns fill. .direction Direction fill missing values. Currently either \"\" (default), \"\", \"downup\" (.e. first ) \"updown\" (first ).","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/fill.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Fill in missing values with previous or next value — fill","text":"Missing values replaced atomic vectors; NULLs replaced lists.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/fill.html","id":"grouped-data-frames","dir":"Reference","previous_headings":"","what":"Grouped data frames","title":"Fill in missing values with previous or next value — fill","text":"grouped data frames created dplyr::group_by(), fill() applied within group, meaning fill across group boundaries.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/fill.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Fill in missing values with previous or next value — fill","text":"","code":"# direction = \"down\" -------------------------------------------------------- # Value (year) is recorded only when it changes sales <- tibble::tribble( ~quarter, ~year, ~sales, \"Q1\", 2000, 66013, \"Q2\", NA, 69182, \"Q3\", NA, 53175, \"Q4\", NA, 21001, \"Q1\", 2001, 46036, \"Q2\", NA, 58842, \"Q3\", NA, 44568, \"Q4\", NA, 50197, \"Q1\", 2002, 39113, \"Q2\", NA, 41668, \"Q3\", NA, 30144, \"Q4\", NA, 52897, \"Q1\", 2004, 32129, \"Q2\", NA, 67686, \"Q3\", NA, 31768, \"Q4\", NA, 49094 ) # `fill()` defaults to replacing missing data from top to bottom sales %>% fill(year) #> # A tibble: 16 × 3 #> quarter year sales #> #> 1 Q1 2000 66013 #> 2 Q2 2000 69182 #> 3 Q3 2000 53175 #> 4 Q4 2000 21001 #> 5 Q1 2001 46036 #> 6 Q2 2001 58842 #> 7 Q3 2001 44568 #> 8 Q4 2001 50197 #> 9 Q1 2002 39113 #> 10 Q2 2002 41668 #> 11 Q3 2002 30144 #> 12 Q4 2002 52897 #> 13 Q1 2004 32129 #> 14 Q2 2004 67686 #> 15 Q3 2004 31768 #> 16 Q4 2004 49094 # direction = \"up\" ---------------------------------------------------------- # Value (pet_type) is missing above tidy_pets <- tibble::tribble( ~rank, ~pet_type, ~breed, 1L, NA, \"Boston Terrier\", 2L, NA, \"Retrievers (Labrador)\", 3L, NA, \"Retrievers (Golden)\", 4L, NA, \"French Bulldogs\", 5L, NA, \"Bulldogs\", 6L, \"Dog\", \"Beagles\", 1L, NA, \"Persian\", 2L, NA, \"Maine Coon\", 3L, NA, \"Ragdoll\", 4L, NA, \"Exotic\", 5L, NA, \"Siamese\", 6L, \"Cat\", \"American Short\" ) # For values that are missing above you can use `.direction = \"up\"` tidy_pets %>% fill(pet_type, .direction = \"up\") #> # A tibble: 12 × 3 #> rank pet_type breed #> #> 1 1 Dog Boston Terrier #> 2 2 Dog Retrievers (Labrador) #> 3 3 Dog Retrievers (Golden) #> 4 4 Dog French Bulldogs #> 5 5 Dog Bulldogs #> 6 6 Dog Beagles #> 7 1 Cat Persian #> 8 2 Cat Maine Coon #> 9 3 Cat Ragdoll #> 10 4 Cat Exotic #> 11 5 Cat Siamese #> 12 6 Cat American Short # direction = \"downup\" ------------------------------------------------------ # Value (n_squirrels) is missing above and below within a group squirrels <- tibble::tribble( ~group, ~name, ~role, ~n_squirrels, 1, \"Sam\", \"Observer\", NA, 1, \"Mara\", \"Scorekeeper\", 8, 1, \"Jesse\", \"Observer\", NA, 1, \"Tom\", \"Observer\", NA, 2, \"Mike\", \"Observer\", NA, 2, \"Rachael\", \"Observer\", NA, 2, \"Sydekea\", \"Scorekeeper\", 14, 2, \"Gabriela\", \"Observer\", NA, 3, \"Derrick\", \"Observer\", NA, 3, \"Kara\", \"Scorekeeper\", 9, 3, \"Emily\", \"Observer\", NA, 3, \"Danielle\", \"Observer\", NA ) # The values are inconsistently missing by position within the group # Use .direction = \"downup\" to fill missing values in both directions squirrels %>% dplyr::group_by(group) %>% fill(n_squirrels, .direction = \"downup\") %>% dplyr::ungroup() #> # A tibble: 12 × 4 #> group name role n_squirrels #> #> 1 1 Sam Observer 8 #> 2 1 Mara Scorekeeper 8 #> 3 1 Jesse Observer 8 #> 4 1 Tom Observer 8 #> 5 2 Mike Observer 14 #> 6 2 Rachael Observer 14 #> 7 2 Sydekea Scorekeeper 14 #> 8 2 Gabriela Observer 14 #> 9 3 Derrick Observer 9 #> 10 3 Kara Scorekeeper 9 #> 11 3 Emily Observer 9 #> 12 3 Danielle Observer 9 # Using `.direction = \"updown\"` accomplishes the same goal in this example"},{"path":"https://tidyr.tidyverse.org/dev/reference/fish_encounters.html","id":null,"dir":"Reference","previous_headings":"","what":"Fish encounters — fish_encounters","title":"Fish encounters — fish_encounters","text":"Information fish swimming river: station represents autonomous monitor records tagged fish seen location. Fish travel one direction (migrating downstream). Information misses just important hits, directly recorded form data.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/fish_encounters.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fish encounters — fish_encounters","text":"","code":"fish_encounters"},{"path":"https://tidyr.tidyverse.org/dev/reference/fish_encounters.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Fish encounters — fish_encounters","text":"dataset variables: fish Fish identifier station Measurement station seen fish seen? (1 yes, true rows)","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/fish_encounters.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Fish encounters — fish_encounters","text":"Dataset provided Myfanwy Johnston; details https://fishsciences.github.io/post/visualizing-fish-encounter-histories/","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/full_seq.html","id":null,"dir":"Reference","previous_headings":"","what":"Create the full sequence of values in a vector — full_seq","title":"Create the full sequence of values in a vector — full_seq","text":"useful want fill missing values observed . example, full_seq(c(1, 2, 4, 6), 1) return 1:6.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/full_seq.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create the full sequence of values in a vector — full_seq","text":"","code":"full_seq(x, period, tol = 1e-06)"},{"path":"https://tidyr.tidyverse.org/dev/reference/full_seq.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create the full sequence of values in a vector — full_seq","text":"x numeric vector. period Gap observation. existing data checked ensure actually periodicity. tol Numerical tolerance checking periodicity.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/full_seq.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create the full sequence of values in a vector — full_seq","text":"","code":"full_seq(c(1, 2, 4, 5, 10), 1) #> [1] 1 2 3 4 5 6 7 8 9 10"},{"path":"https://tidyr.tidyverse.org/dev/reference/gather.html","id":null,"dir":"Reference","previous_headings":"","what":"Gather columns into key-value pairs — gather","title":"Gather columns into key-value pairs — gather","text":"Development gather() complete, new code recommend switching pivot_longer(), easier use, featureful, still active development. df %>% gather(\"key\", \"value\", x, y, z) equivalent df %>% pivot_longer(c(x, y, z), names_to = \"key\", values_to = \"value\") See details vignette(\"pivot\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/gather.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Gather columns into key-value pairs — gather","text":"","code":"gather( data, key = \"key\", value = \"value\", ..., na.rm = FALSE, convert = FALSE, factor_key = FALSE )"},{"path":"https://tidyr.tidyverse.org/dev/reference/gather.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Gather columns into key-value pairs — gather","text":"data data frame. key, value Names new key value columns, strings symbols. argument passed expression supports quasiquotation (can unquote strings symbols). name captured expression rlang::ensym() (note kind interface symbols represent actual objects now discouraged tidyverse; support backward compatibility). ... selection columns. empty, variables selected. can supply bare variable names, select variables x z x:z, exclude y -y. options, see dplyr::select() documentation. See also section selection rules . na.rm TRUE, remove rows output value column NA. convert TRUE automatically run type.convert() key column. useful column types actually numeric, integer, logical. factor_key FALSE, default, key values stored character vector. TRUE, stored factor, preserves original ordering columns.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/gather.html","id":"rules-for-selection","dir":"Reference","previous_headings":"","what":"Rules for selection","title":"Gather columns into key-value pairs — gather","text":"Arguments selecting columns passed tidyselect::vars_select() treated specially. Unlike verbs, selecting functions make strict distinction data expressions context expressions. data expression either bare name like x expression like x:y c(x, y). data expression, can refer columns data frame. Everything else context expression can refer objects defined <-. instance, col1:col3 data expression refers data columns, seq(start, end) context expression refers objects contexts. need refer contextual objects data expression, can use all_of() any_of(). functions used select data-variables whose names stored env-variable. instance, all_of() selects variables listed character vector . details, see tidyselect::select_helpers() documentation.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/gather.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Gather columns into key-value pairs — gather","text":"","code":"# From https://stackoverflow.com/questions/1181060 stocks <- tibble( time = as.Date(\"2009-01-01\") + 0:9, X = rnorm(10, 0, 1), Y = rnorm(10, 0, 2), Z = rnorm(10, 0, 4) ) gather(stocks, \"stock\", \"price\", -time) #> # A tibble: 30 × 3 #> time stock price #> #> 1 2009-01-01 X -1.82 #> 2 2009-01-02 X -0.247 #> 3 2009-01-03 X -0.244 #> 4 2009-01-04 X -0.283 #> 5 2009-01-05 X -0.554 #> 6 2009-01-06 X 0.629 #> 7 2009-01-07 X 2.07 #> 8 2009-01-08 X -1.63 #> 9 2009-01-09 X 0.512 #> 10 2009-01-10 X -1.86 #> # ℹ 20 more rows stocks %>% gather(\"stock\", \"price\", -time) #> # A tibble: 30 × 3 #> time stock price #> #> 1 2009-01-01 X -1.82 #> 2 2009-01-02 X -0.247 #> 3 2009-01-03 X -0.244 #> 4 2009-01-04 X -0.283 #> 5 2009-01-05 X -0.554 #> 6 2009-01-06 X 0.629 #> 7 2009-01-07 X 2.07 #> 8 2009-01-08 X -1.63 #> 9 2009-01-09 X 0.512 #> 10 2009-01-10 X -1.86 #> # ℹ 20 more rows # get first observation for each Species in iris data -- base R mini_iris <- iris[c(1, 51, 101), ] # gather Sepal.Length, Sepal.Width, Petal.Length, Petal.Width gather(mini_iris, key = \"flower_att\", value = \"measurement\", Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) #> Species flower_att measurement #> 1 setosa Sepal.Length 5.1 #> 2 versicolor Sepal.Length 7.0 #> 3 virginica Sepal.Length 6.3 #> 4 setosa Sepal.Width 3.5 #> 5 versicolor Sepal.Width 3.2 #> 6 virginica Sepal.Width 3.3 #> 7 setosa Petal.Length 1.4 #> 8 versicolor Petal.Length 4.7 #> 9 virginica Petal.Length 6.0 #> 10 setosa Petal.Width 0.2 #> 11 versicolor Petal.Width 1.4 #> 12 virginica Petal.Width 2.5 # same result but less verbose gather(mini_iris, key = \"flower_att\", value = \"measurement\", -Species) #> Species flower_att measurement #> 1 setosa Sepal.Length 5.1 #> 2 versicolor Sepal.Length 7.0 #> 3 virginica Sepal.Length 6.3 #> 4 setosa Sepal.Width 3.5 #> 5 versicolor Sepal.Width 3.2 #> 6 virginica Sepal.Width 3.3 #> 7 setosa Petal.Length 1.4 #> 8 versicolor Petal.Length 4.7 #> 9 virginica Petal.Length 6.0 #> 10 setosa Petal.Width 0.2 #> 11 versicolor Petal.Width 1.4 #> 12 virginica Petal.Width 2.5"},{"path":"https://tidyr.tidyverse.org/dev/reference/hoist.html","id":null,"dir":"Reference","previous_headings":"","what":"Hoist values out of list-columns — hoist","title":"Hoist values out of list-columns — hoist","text":"hoist() allows selectively pull components list-column top-level columns, using syntax purrr::pluck(). Learn vignette(\"rectangle\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/hoist.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Hoist values out of list-columns — hoist","text":"","code":"hoist( .data, .col, ..., .remove = TRUE, .simplify = TRUE, .ptype = NULL, .transform = NULL )"},{"path":"https://tidyr.tidyverse.org/dev/reference/hoist.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Hoist values out of list-columns — hoist","text":".data data frame. .col List-column extract components . ... Components .col turn columns form col_name = \"pluck_specification\". can pluck name character vector, position integer vector, combination two list. See purrr::pluck() details. column names must unique call hoist(), although existing columns name overwritten. plucking single string can choose omit name, .e. hoist(df, col, \"x\") short-hand hoist(df, col, x = \"x\"). .remove TRUE, default, remove extracted components .col. ensures value lives one place. components removed .col, .col removed result entirely. .simplify TRUE, attempt simplify lists length-1 vectors atomic vector. Can also named list containing TRUE FALSE declaring whether attempt simplify particular column. named list provided, default unspecified columns TRUE. .ptype Optionally, named list prototypes declaring desired output type component. Alternatively, single empty prototype can supplied, applied components. Use argument want check element type expect simplifying. ptype specified, simplify = FALSE simplification possible, list-column returned element type ptype. .transform Optionally, named list transformation functions applied component. Alternatively, single function can supplied, applied components. Use argument want transform parse individual elements extracted. ptype transform supplied, transform applied ptype.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/hoist.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Hoist values out of list-columns — hoist","text":"","code":"df <- tibble( character = c(\"Toothless\", \"Dory\"), metadata = list( list( species = \"dragon\", color = \"black\", films = c( \"How to Train Your Dragon\", \"How to Train Your Dragon 2\", \"How to Train Your Dragon: The Hidden World\" ) ), list( species = \"blue tang\", color = \"blue\", films = c(\"Finding Nemo\", \"Finding Dory\") ) ) ) df #> # A tibble: 2 × 2 #> character metadata #> #> 1 Toothless #> 2 Dory # Extract only specified components df %>% hoist(metadata, \"species\", first_film = list(\"films\", 1L), third_film = list(\"films\", 3L) ) #> # A tibble: 2 × 5 #> character species first_film third_film metadata #> #> 1 Toothless dragon How to Train Your Dragon How to Train … #> 2 Dory blue tang Finding Nemo NA "},{"path":"https://tidyr.tidyverse.org/dev/reference/household.html","id":null,"dir":"Reference","previous_headings":"","what":"Household data — household","title":"Household data — household","text":"dataset based example vignette(\"datatable-reshape\", package = \"data.table\")","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/household.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Household data — household","text":"","code":"household"},{"path":"https://tidyr.tidyverse.org/dev/reference/household.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Household data — household","text":"data frame 5 rows 5 columns: family Family identifier dob_child1 Date birth first child dob_child2 Date birth second child name_child1 Name first child name_child2 Name second child","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/nest.html","id":null,"dir":"Reference","previous_headings":"","what":"Nest rows into a list-column of data frames — nest","title":"Nest rows into a list-column of data frames — nest","text":"Nesting creates list-column data frames; unnesting flattens back regular columns. Nesting implicitly summarising operation: get one row group defined non-nested columns. useful conjunction summaries work whole datasets, notably models. Learn vignette(\"nest\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/nest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Nest rows into a list-column of data frames — nest","text":"","code":"nest(.data, ..., .by = NULL, .key = NULL, .names_sep = NULL)"},{"path":"https://tidyr.tidyverse.org/dev/reference/nest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Nest rows into a list-column of data frames — nest","text":".data data frame. ... Columns nest; appear inner data frames. Specified using name-variable pairs form new_col = c(col1, col2, col3). right hand side can valid tidyselect expression. supplied, ... derived columns selected ., use column name .key. : previously write df %>% nest(x, y, z). Convert df %>% nest(data = c(x, y, z)). . Columns nest ; remain outer data frame. .can used place conjunction columns supplied .... supplied, .derived columns selected .... .key name resulting nested column. applicable ... specified, .e. case df %>% nest(.= x). NULL, \"data\" used default. .names_sep NULL, default, inner names come former outer names. string, new inner names use outer names names_sep automatically stripped. makes names_sep roughly symmetric nesting unnesting.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/nest.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Nest rows into a list-column of data frames — nest","text":"neither ... .supplied, nest() nest variables, use column name supplied .key.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/nest.html","id":"new-syntax","dir":"Reference","previous_headings":"","what":"New syntax","title":"Nest rows into a list-column of data frames — nest","text":"tidyr 1.0.0 introduced new syntax nest() unnest() designed similar functions. Converting new syntax straightforward (guided message receive) just need run old analysis, can easily revert previous behaviour using nest_legacy() unnest_legacy() follows:","code":"library(tidyr) nest <- nest_legacy unnest <- unnest_legacy"},{"path":"https://tidyr.tidyverse.org/dev/reference/nest.html","id":"grouped-data-frames","dir":"Reference","previous_headings":"","what":"Grouped data frames","title":"Nest rows into a list-column of data frames — nest","text":"df %>% nest(data = c(x, y)) specifies columns nested; .e. columns appear inner data frame. df %>% nest(.= c(x, y)) specifies columns nest ; .e. columns remain outer data frame. alternative way achieve latter nest() grouped data frame created dplyr::group_by(). grouping variables remain outer data frame others nested. result preserves grouping input. Variables supplied nest() override grouping variables df %>% group_by(x, y) %>% nest(data = !z) equivalent df %>% nest(data = !z). supply .grouped data frame, groups already represent nesting .","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/nest.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Nest rows into a list-column of data frames — nest","text":"","code":"df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1) # Specify variables to nest using name-variable pairs. # Note that we get one row of output for each unique combination of # non-nested variables. df %>% nest(data = c(y, z)) #> # A tibble: 3 × 2 #> x data #> #> 1 1 #> 2 2 #> 3 3 # Specify variables to nest by (rather than variables to nest) using `.by` df %>% nest(.by = x) #> # A tibble: 3 × 2 #> x data #> #> 1 1 #> 2 2 #> 3 3 # In this case, since `...` isn't used you can specify the resulting column # name with `.key` df %>% nest(.by = x, .key = \"cols\") #> # A tibble: 3 × 2 #> x cols #> #> 1 1 #> 2 2 #> 3 3 # Use tidyselect syntax and helpers, just like in `dplyr::select()` df %>% nest(data = any_of(c(\"y\", \"z\"))) #> # A tibble: 3 × 2 #> x data #> #> 1 1 #> 2 2 #> 3 3 # `...` and `.by` can be used together to drop columns you no longer need, # or to include the columns you are nesting by in the inner data frame too. # This drops `z`: df %>% nest(data = y, .by = x) #> # A tibble: 3 × 2 #> x data #> #> 1 1 #> 2 2 #> 3 3 # This includes `x` in the inner data frame: df %>% nest(data = everything(), .by = x) #> # A tibble: 3 × 2 #> x data #> #> 1 1 #> 2 2 #> 3 3 # Multiple nesting structures can be specified at once iris %>% nest(petal = starts_with(\"Petal\"), sepal = starts_with(\"Sepal\")) #> # A tibble: 3 × 3 #> Species petal sepal #> #> 1 setosa #> 2 versicolor #> 3 virginica iris %>% nest(width = contains(\"Width\"), length = contains(\"Length\")) #> # A tibble: 3 × 3 #> Species width length #> #> 1 setosa #> 2 versicolor #> 3 virginica # Nesting a grouped data frame nests all variables apart from the group vars fish_encounters %>% dplyr::group_by(fish) %>% nest() #> # A tibble: 19 × 2 #> # Groups: fish [19] #> fish data #> #> 1 4842 #> 2 4843 #> 3 4844 #> 4 4845 #> 5 4847 #> 6 4848 #> 7 4849 #> 8 4850 #> 9 4851 #> 10 4854 #> 11 4855 #> 12 4857 #> 13 4858 #> 14 4859 #> 15 4861 #> 16 4862 #> 17 4863 #> 18 4864 #> 19 4865 # That is similar to `nest(.by = )`, except here the result isn't grouped fish_encounters %>% nest(.by = fish) #> # A tibble: 19 × 2 #> fish data #> #> 1 4842 #> 2 4843 #> 3 4844 #> 4 4845 #> 5 4847 #> 6 4848 #> 7 4849 #> 8 4850 #> 9 4851 #> 10 4854 #> 11 4855 #> 12 4857 #> 13 4858 #> 14 4859 #> 15 4861 #> 16 4862 #> 17 4863 #> 18 4864 #> 19 4865 # Nesting is often useful for creating per group models mtcars %>% nest(.by = cyl) %>% dplyr::mutate(models = lapply(data, function(df) lm(mpg ~ wt, data = df))) #> # A tibble: 3 × 3 #> cyl data models #> #> 1 6 #> 2 4 #> 3 8 "},{"path":"https://tidyr.tidyverse.org/dev/reference/nest_legacy.html","id":null,"dir":"Reference","previous_headings":"","what":"Legacy versions of nest() and unnest() — nest_legacy","title":"Legacy versions of nest() and unnest() — nest_legacy","text":"tidyr 1.0.0 introduced new syntax nest() unnest(). majority existing usage automatically translated new syntax warning. However, need quickly roll back previous behaviour, functions provide previous interface. make old code work , add following code top script:","code":"library(tidyr) nest <- nest_legacy unnest <- unnest_legacy"},{"path":"https://tidyr.tidyverse.org/dev/reference/nest_legacy.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Legacy versions of nest() and unnest() — nest_legacy","text":"","code":"nest_legacy(data, ..., .key = \"data\") unnest_legacy(data, ..., .drop = NA, .id = NULL, .sep = NULL, .preserve = NULL)"},{"path":"https://tidyr.tidyverse.org/dev/reference/nest_legacy.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Legacy versions of nest() and unnest() — nest_legacy","text":"data data frame. ... Specification columns unnest. Use bare variable names functions variables. omitted, defaults list-cols. .key name new column, string symbol. argument passed expression supports quasiquotation (can unquote strings symbols). name captured expression rlang::ensym() (note kind interface symbols represent actual objects now discouraged tidyverse; support backward compatibility). .drop additional list columns dropped? default, unnest() drop unnesting specified columns requires rows duplicated. .id Data frame identifier - supplied, create new column name .id, giving unique identifier. useful list column named. .sep non-NULL, names unnested data frame columns combine name original list-col names nested data frame, separated .sep. .preserve Optionally, list-columns preserve output. duplicated way atomic vectors. dplyr::select() semantics can preserve multiple variables .preserve = c(x, y) .preserve = starts_with(\"list\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/nest_legacy.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Legacy versions of nest() and unnest() — nest_legacy","text":"","code":"# Nest and unnest are inverses df <- tibble(x = c(1, 1, 2), y = 3:1) df %>% nest_legacy(y) #> # A tibble: 2 × 2 #> x data #> #> 1 1 #> 2 2 df %>% nest_legacy(y) %>% unnest_legacy() #> # A tibble: 3 × 2 #> x y #> #> 1 1 3 #> 2 1 2 #> 3 2 1 # nesting ------------------------------------------------------------------- as_tibble(iris) %>% nest_legacy(!Species) #> # A tibble: 3 × 2 #> Species data #> #> 1 setosa #> 2 versicolor #> 3 virginica as_tibble(chickwts) %>% nest_legacy(weight) #> # A tibble: 6 × 2 #> feed data #> #> 1 horsebean #> 2 linseed #> 3 soybean #> 4 sunflower #> 5 meatmeal #> 6 casein # unnesting ----------------------------------------------------------------- df <- tibble( x = 1:2, y = list( tibble(z = 1), tibble(z = 3:4) ) ) df %>% unnest_legacy(y) #> # A tibble: 3 × 2 #> x z #> #> 1 1 1 #> 2 2 3 #> 3 2 4 # You can also unnest multiple columns simultaneously df <- tibble( a = list(c(\"a\", \"b\"), \"c\"), b = list(1:2, 3), c = c(11, 22) ) df %>% unnest_legacy(a, b) #> # A tibble: 3 × 3 #> c a b #> #> 1 11 a 1 #> 2 11 b 2 #> 3 22 c 3 # If you omit the column names, it'll unnest all list-cols df %>% unnest_legacy() #> # A tibble: 3 × 3 #> c a b #> #> 1 11 a 1 #> 2 11 b 2 #> 3 22 c 3"},{"path":"https://tidyr.tidyverse.org/dev/reference/pack.html","id":null,"dir":"Reference","previous_headings":"","what":"Pack and unpack — pack","title":"Pack and unpack — pack","text":"Packing unpacking preserve length data frame, changing width. pack() makes df narrow collapsing set columns single df-column. unpack() makes data wider expanding df-columns back individual columns.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pack.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pack and unpack — pack","text":"","code":"pack(.data, ..., .names_sep = NULL, .error_call = current_env()) unpack( data, cols, ..., names_sep = NULL, names_repair = \"check_unique\", error_call = current_env() )"},{"path":"https://tidyr.tidyverse.org/dev/reference/pack.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pack and unpack — pack","text":"... pack(), columns pack, specified using name-variable pairs form new_col = c(col1, col2, col3). right hand side can valid tidy select expression. unpack(), dots future extensions must empty. data, .data data frame. cols Columns unpack. names_sep, .names_sep NULL, default, names left . pack(), inner names come former outer names; unpack(), new outer names come inner names. string, inner outer names used together. unpack(), names new outer columns formed pasting together outer inner column names, separated names_sep. pack(), new inner names outer names + names_sep automatically stripped. makes names_sep roughly symmetric packing unpacking. names_repair Used check output data frame valid names. Must one following options: \"minimal\": name repair checks, beyond basic existence, \"unique\": make sure names unique empty, \"check_unique\": (default), name repair, check unique, \"universal\": make names unique syntactic function: apply custom name repair. tidyr_legacy: use name repair tidyr 0.8. formula: purrr-style anonymous function (see rlang::as_function()) See vctrs::vec_as_names() details terms strategies used enforce . error_call, .error_call execution environment currently running function, e.g. caller_env(). function mentioned error messages source error. See call argument abort() information.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pack.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Pack and unpack — pack","text":"Generally, unpacking useful packing simplifies complex data structure. Currently, functions work df-cols, mostly curiosity, seem worth exploring mimic nested column headers popular Excel.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pack.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Pack and unpack — pack","text":"","code":"# Packing ------------------------------------------------------------------- # It's not currently clear why you would ever want to pack columns # since few functions work with this sort of data. df <- tibble(x1 = 1:3, x2 = 4:6, x3 = 7:9, y = 1:3) df #> # A tibble: 3 × 4 #> x1 x2 x3 y #> #> 1 1 4 7 1 #> 2 2 5 8 2 #> 3 3 6 9 3 df %>% pack(x = starts_with(\"x\")) #> # A tibble: 3 × 2 #> y x$x1 $x2 $x3 #> #> 1 1 1 4 7 #> 2 2 2 5 8 #> 3 3 3 6 9 df %>% pack(x = c(x1, x2, x3), y = y) #> # A tibble: 3 × 2 #> x$x1 $x2 $x3 y$y #> #> 1 1 4 7 1 #> 2 2 5 8 2 #> 3 3 6 9 3 # .names_sep allows you to strip off common prefixes; this # acts as a natural inverse to name_sep in unpack() iris %>% as_tibble() %>% pack( Sepal = starts_with(\"Sepal\"), Petal = starts_with(\"Petal\"), .names_sep = \".\" ) #> # A tibble: 150 × 3 #> Species Sepal$Length $Width Petal$Length $Width #> #> 1 setosa 5.1 3.5 1.4 0.2 #> 2 setosa 4.9 3 1.4 0.2 #> 3 setosa 4.7 3.2 1.3 0.2 #> 4 setosa 4.6 3.1 1.5 0.2 #> 5 setosa 5 3.6 1.4 0.2 #> 6 setosa 5.4 3.9 1.7 0.4 #> 7 setosa 4.6 3.4 1.4 0.3 #> 8 setosa 5 3.4 1.5 0.2 #> 9 setosa 4.4 2.9 1.4 0.2 #> 10 setosa 4.9 3.1 1.5 0.1 #> # ℹ 140 more rows # Unpacking ----------------------------------------------------------------- df <- tibble( x = 1:3, y = tibble(a = 1:3, b = 3:1), z = tibble(X = c(\"a\", \"b\", \"c\"), Y = runif(3), Z = c(TRUE, FALSE, NA)) ) df #> # A tibble: 3 × 3 #> x y$a $b z$X $Y $Z #> #> 1 1 1 3 a 0.0281 TRUE #> 2 2 2 2 b 0.466 FALSE #> 3 3 3 1 c 0.390 NA df %>% unpack(y) #> # A tibble: 3 × 4 #> x a b z$X $Y $Z #> #> 1 1 1 3 a 0.0281 TRUE #> 2 2 2 2 b 0.466 FALSE #> 3 3 3 1 c 0.390 NA df %>% unpack(c(y, z)) #> # A tibble: 3 × 6 #> x a b X Y Z #> #> 1 1 1 3 a 0.0281 TRUE #> 2 2 2 2 b 0.466 FALSE #> 3 3 3 1 c 0.390 NA df %>% unpack(c(y, z), names_sep = \"_\") #> # A tibble: 3 × 6 #> x y_a y_b z_X z_Y z_Z #> #> 1 1 1 3 a 0.0281 TRUE #> 2 2 2 2 b 0.466 FALSE #> 3 3 3 1 c 0.390 NA"},{"path":"https://tidyr.tidyverse.org/dev/reference/pipe.html","id":null,"dir":"Reference","previous_headings":"","what":"Pipe operator — %>%","title":"Pipe operator — %>%","text":"See %>% details.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pipe.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pipe operator — %>%","text":"","code":"lhs %>% rhs"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer.html","id":null,"dir":"Reference","previous_headings":"","what":"Pivot data from wide to long — pivot_longer","title":"Pivot data from wide to long — pivot_longer","text":"pivot_longer() \"lengthens\" data, increasing number rows decreasing number columns. inverse transformation pivot_wider() Learn vignette(\"pivot\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pivot data from wide to long — pivot_longer","text":"","code":"pivot_longer( data, cols, ..., cols_vary = \"fastest\", names_to = \"name\", names_prefix = NULL, names_sep = NULL, names_pattern = NULL, names_ptypes = NULL, names_transform = NULL, names_repair = \"check_unique\", values_to = \"value\", values_drop_na = FALSE, values_ptypes = NULL, values_transform = NULL )"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pivot data from wide to long — pivot_longer","text":"data data frame pivot. cols Columns pivot longer format. ... Additional arguments passed methods. cols_vary pivoting cols longer format, output rows arranged relative original row number? \"fastest\", default, keeps individual rows cols close together output. often produces intuitively ordered output least one key column data involved pivoting process. \"slowest\" keeps individual columns cols close together output. often produces intuitively ordered output utilize columns data pivoting process. names_to character vector specifying new column columns create information stored column names data specified cols. length 0, NULL supplied, columns created. length 1, single column created contain column names specified cols. length >1, multiple columns created. case, one names_sep names_pattern must supplied specify column names split. also two additional character values can take advantage : NA discard corresponding component column name. \".value\" indicates corresponding component column name defines name output column containing cell values, overriding values_to entirely. names_prefix regular expression used remove matching text start variable name. names_sep, names_pattern names_to contains multiple values, arguments control column name broken . names_sep takes specification separate(), can either numeric vector (specifying positions break ), single string (specifying regular expression split ). names_pattern takes specification extract(), regular expression containing matching groups (()). arguments give enough control, use pivot_longer_spec() create spec object process manually needed. names_ptypes, values_ptypes Optionally, list column name-prototype pairs. Alternatively, single empty prototype can supplied, applied columns. prototype (ptype short) zero-length vector (like integer() numeric()) defines type, class, attributes vector. Use arguments want confirm created columns types expect. Note want change (instead confirm) types specific columns, use names_transform values_transform instead. names_transform, values_transform Optionally, list column name-function pairs. Alternatively, single function can supplied, applied columns. Use arguments need change types specific columns. example, names_transform = list(week = .integer) convert character variable called week integer. specified, type columns generated names_to character, type variables generated values_to common type input columns used generate . names_repair happens output invalid column names? default, \"check_unique\" error columns duplicated. Use \"minimal\" allow duplicates output, \"unique\" de-duplicated adding numeric suffixes. See vctrs::vec_as_names() options. values_to string specifying name column create data stored cell values. names_to character containing special .value sentinel, value ignored, name value column derived part existing column names. values_drop_na TRUE, drop rows contain NAs value_to column. effectively converts explicit missing values implicit missing values, generally used missing values data created structure.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Pivot data from wide to long — pivot_longer","text":"pivot_longer() updated approach gather(), designed simpler use handle use cases. recommend use pivot_longer() new code; gather() going away longer active development.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Pivot data from wide to long — pivot_longer","text":"","code":"# See vignette(\"pivot\") for examples and explanation # Simplest case where column names are character data relig_income #> # A tibble: 18 × 11 #> religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` #> #> 1 Agnostic 27 34 60 81 76 137 #> 2 Atheist 12 27 37 52 35 70 #> 3 Buddhist 27 21 30 34 33 58 #> 4 Catholic 418 617 732 670 638 1116 #> 5 Don’t know/r… 15 14 15 11 10 35 #> 6 Evangelical … 575 869 1064 982 881 1486 #> 7 Hindu 1 9 7 9 11 34 #> 8 Historically… 228 244 236 238 197 223 #> 9 Jehovah's Wi… 20 27 24 24 21 30 #> 10 Jewish 19 19 25 25 30 95 #> 11 Mainline Prot 289 495 619 655 651 1107 #> 12 Mormon 29 40 48 51 56 112 #> 13 Muslim 6 7 9 10 9 23 #> 14 Orthodox 13 17 23 32 32 47 #> 15 Other Christ… 9 7 11 13 13 14 #> 16 Other Faiths 20 33 40 46 49 63 #> 17 Other World … 5 2 3 4 2 7 #> 18 Unaffiliated 217 299 374 365 341 528 #> # ℹ 4 more variables: `$75-100k` , `$100-150k` , `>150k` , #> # `Don't know/refused` relig_income %>% pivot_longer(!religion, names_to = \"income\", values_to = \"count\") #> # A tibble: 180 × 3 #> religion income count #> #> 1 Agnostic <$10k 27 #> 2 Agnostic $10-20k 34 #> 3 Agnostic $20-30k 60 #> 4 Agnostic $30-40k 81 #> 5 Agnostic $40-50k 76 #> 6 Agnostic $50-75k 137 #> 7 Agnostic $75-100k 122 #> 8 Agnostic $100-150k 109 #> 9 Agnostic >150k 84 #> 10 Agnostic Don't know/refused 96 #> # ℹ 170 more rows # Slightly more complex case where columns have common prefix, # and missing missings are structural so should be dropped. billboard #> # A tibble: 317 × 79 #> artist track date.entered wk1 wk2 wk3 wk4 wk5 wk6 wk7 #> #> 1 2 Pac Baby… 2000-02-26 87 82 72 77 87 94 99 #> 2 2Ge+her The … 2000-09-02 91 87 92 NA NA NA NA #> 3 3 Doors D… Kryp… 2000-04-08 81 70 68 67 66 57 54 #> 4 3 Doors D… Loser 2000-10-21 76 76 72 69 67 65 55 #> 5 504 Boyz Wobb… 2000-04-15 57 34 25 17 17 31 36 #> 6 98^0 Give… 2000-08-19 51 39 34 26 26 19 2 #> 7 A*Teens Danc… 2000-07-08 97 97 96 95 100 NA NA #> 8 Aaliyah I Do… 2000-01-29 84 62 51 41 38 35 35 #> 9 Aaliyah Try … 2000-03-18 59 53 38 28 21 18 16 #> 10 Adams, Yo… Open… 2000-08-26 76 76 74 69 68 67 61 #> # ℹ 307 more rows #> # ℹ 69 more variables: wk8 , wk9 , wk10 , wk11 , #> # wk12 , wk13 , wk14 , wk15 , wk16 , #> # wk17 , wk18 , wk19 , wk20 , wk21 , #> # wk22 , wk23 , wk24 , wk25 , wk26 , #> # wk27 , wk28 , wk29 , wk30 , wk31 , #> # wk32 , wk33 , wk34 , wk35 , wk36 , … billboard %>% pivot_longer( cols = starts_with(\"wk\"), names_to = \"week\", names_prefix = \"wk\", values_to = \"rank\", values_drop_na = TRUE ) #> # A tibble: 5,307 × 5 #> artist track date.entered week rank #> #> 1 2 Pac Baby Don't Cry (Keep... 2000-02-26 1 87 #> 2 2 Pac Baby Don't Cry (Keep... 2000-02-26 2 82 #> 3 2 Pac Baby Don't Cry (Keep... 2000-02-26 3 72 #> 4 2 Pac Baby Don't Cry (Keep... 2000-02-26 4 77 #> 5 2 Pac Baby Don't Cry (Keep... 2000-02-26 5 87 #> 6 2 Pac Baby Don't Cry (Keep... 2000-02-26 6 94 #> 7 2 Pac Baby Don't Cry (Keep... 2000-02-26 7 99 #> 8 2Ge+her The Hardest Part Of ... 2000-09-02 1 91 #> 9 2Ge+her The Hardest Part Of ... 2000-09-02 2 87 #> 10 2Ge+her The Hardest Part Of ... 2000-09-02 3 92 #> # ℹ 5,297 more rows # Multiple variables stored in column names who %>% pivot_longer( cols = new_sp_m014:newrel_f65, names_to = c(\"diagnosis\", \"gender\", \"age\"), names_pattern = \"new_?(.*)_(.)(.*)\", values_to = \"count\" ) #> # A tibble: 405,440 × 8 #> country iso2 iso3 year diagnosis gender age count #> #> 1 Afghanistan AF AFG 1980 sp m 014 NA #> 2 Afghanistan AF AFG 1980 sp m 1524 NA #> 3 Afghanistan AF AFG 1980 sp m 2534 NA #> 4 Afghanistan AF AFG 1980 sp m 3544 NA #> 5 Afghanistan AF AFG 1980 sp m 4554 NA #> 6 Afghanistan AF AFG 1980 sp m 5564 NA #> 7 Afghanistan AF AFG 1980 sp m 65 NA #> 8 Afghanistan AF AFG 1980 sp f 014 NA #> 9 Afghanistan AF AFG 1980 sp f 1524 NA #> 10 Afghanistan AF AFG 1980 sp f 2534 NA #> # ℹ 405,430 more rows # Multiple observations per row. Since all columns are used in the pivoting # process, we'll use `cols_vary` to keep values from the original columns # close together in the output. anscombe #> x1 x2 x3 x4 y1 y2 y3 y4 #> 1 10 10 10 8 8.04 9.14 7.46 6.58 #> 2 8 8 8 8 6.95 8.14 6.77 5.76 #> 3 13 13 13 8 7.58 8.74 12.74 7.71 #> 4 9 9 9 8 8.81 8.77 7.11 8.84 #> 5 11 11 11 8 8.33 9.26 7.81 8.47 #> 6 14 14 14 8 9.96 8.10 8.84 7.04 #> 7 6 6 6 8 7.24 6.13 6.08 5.25 #> 8 4 4 4 19 4.26 3.10 5.39 12.50 #> 9 12 12 12 8 10.84 9.13 8.15 5.56 #> 10 7 7 7 8 4.82 7.26 6.42 7.91 #> 11 5 5 5 8 5.68 4.74 5.73 6.89 anscombe %>% pivot_longer( everything(), cols_vary = \"slowest\", names_to = c(\".value\", \"set\"), names_pattern = \"(.)(.)\" ) #> # A tibble: 44 × 3 #> set x y #> #> 1 1 10 8.04 #> 2 1 8 6.95 #> 3 1 13 7.58 #> 4 1 9 8.81 #> 5 1 11 8.33 #> 6 1 14 9.96 #> 7 1 6 7.24 #> 8 1 4 4.26 #> 9 1 12 10.8 #> 10 1 7 4.82 #> # ℹ 34 more rows"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer_spec.html","id":null,"dir":"Reference","previous_headings":"","what":"Pivot data from wide to long using a spec — pivot_longer_spec","title":"Pivot data from wide to long using a spec — pivot_longer_spec","text":"low level interface pivoting, inspired cdata package, allows describe pivoting data frame.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer_spec.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pivot data from wide to long using a spec — pivot_longer_spec","text":"","code":"pivot_longer_spec( data, spec, ..., cols_vary = \"fastest\", names_repair = \"check_unique\", values_drop_na = FALSE, values_ptypes = NULL, values_transform = NULL, error_call = current_env() ) build_longer_spec( data, cols, ..., names_to = \"name\", values_to = \"value\", names_prefix = NULL, names_sep = NULL, names_pattern = NULL, names_ptypes = NULL, names_transform = NULL, error_call = current_env() )"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer_spec.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pivot data from wide to long using a spec — pivot_longer_spec","text":"data data frame pivot. spec specification data frame. useful complex pivots gives greater control metadata stored column names turns columns result. Must data frame containing character .name .value columns. Additional columns spec named match columns long format dataset contain values corresponding columns pivoted wide format. special .seq variable used disambiguate rows internally; automatically removed pivoting. ... dots future extensions must empty. cols_vary pivoting cols longer format, output rows arranged relative original row number? \"fastest\", default, keeps individual rows cols close together output. often produces intuitively ordered output least one key column data involved pivoting process. \"slowest\" keeps individual columns cols close together output. often produces intuitively ordered output utilize columns data pivoting process. names_repair happens output invalid column names? default, \"check_unique\" error columns duplicated. Use \"minimal\" allow duplicates output, \"unique\" de-duplicated adding numeric suffixes. See vctrs::vec_as_names() options. values_drop_na TRUE, drop rows contain NAs value_to column. effectively converts explicit missing values implicit missing values, generally used missing values data created structure. error_call execution environment currently running function, e.g. caller_env(). function mentioned error messages source error. See call argument abort() information. cols Columns pivot longer format. names_to character vector specifying new column columns create information stored column names data specified cols. length 0, NULL supplied, columns created. length 1, single column created contain column names specified cols. length >1, multiple columns created. case, one names_sep names_pattern must supplied specify column names split. also two additional character values can take advantage : NA discard corresponding component column name. \".value\" indicates corresponding component column name defines name output column containing cell values, overriding values_to entirely. values_to string specifying name column create data stored cell values. names_to character containing special .value sentinel, value ignored, name value column derived part existing column names. names_prefix regular expression used remove matching text start variable name. names_sep, names_pattern names_to contains multiple values, arguments control column name broken . names_sep takes specification separate(), can either numeric vector (specifying positions break ), single string (specifying regular expression split ). names_pattern takes specification extract(), regular expression containing matching groups (()). arguments give enough control, use pivot_longer_spec() create spec object process manually needed. names_ptypes, values_ptypes Optionally, list column name-prototype pairs. Alternatively, single empty prototype can supplied, applied columns. prototype (ptype short) zero-length vector (like integer() numeric()) defines type, class, attributes vector. Use arguments want confirm created columns types expect. Note want change (instead confirm) types specific columns, use names_transform values_transform instead. names_transform, values_transform Optionally, list column name-function pairs. Alternatively, single function can supplied, applied columns. Use arguments need change types specific columns. example, names_transform = list(week = .integer) convert character variable called week integer. specified, type columns generated names_to character, type variables generated values_to common type input columns used generate .","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_longer_spec.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Pivot data from wide to long using a spec — pivot_longer_spec","text":"","code":"# See vignette(\"pivot\") for examples and explanation # Use `build_longer_spec()` to build `spec` using similar syntax to `pivot_longer()` # and run `pivot_longer_spec()` based on `spec`. spec <- relig_income %>% build_longer_spec( cols = !religion, names_to = \"income\", values_to = \"count\" ) spec #> # A tibble: 10 × 3 #> .name .value income #> #> 1 <$10k count <$10k #> 2 $10-20k count $10-20k #> 3 $20-30k count $20-30k #> 4 $30-40k count $30-40k #> 5 $40-50k count $40-50k #> 6 $50-75k count $50-75k #> 7 $75-100k count $75-100k #> 8 $100-150k count $100-150k #> 9 >150k count >150k #> 10 Don't know/refused count Don't know/refused pivot_longer_spec(relig_income, spec) #> # A tibble: 180 × 3 #> religion income count #> #> 1 Agnostic <$10k 27 #> 2 Agnostic $10-20k 34 #> 3 Agnostic $20-30k 60 #> 4 Agnostic $30-40k 81 #> 5 Agnostic $40-50k 76 #> 6 Agnostic $50-75k 137 #> 7 Agnostic $75-100k 122 #> 8 Agnostic $100-150k 109 #> 9 Agnostic >150k 84 #> 10 Agnostic Don't know/refused 96 #> # ℹ 170 more rows # Is equivalent to: relig_income %>% pivot_longer( cols = !religion, names_to = \"income\", values_to = \"count\" ) #> # A tibble: 180 × 3 #> religion income count #> #> 1 Agnostic <$10k 27 #> 2 Agnostic $10-20k 34 #> 3 Agnostic $20-30k 60 #> 4 Agnostic $30-40k 81 #> 5 Agnostic $40-50k 76 #> 6 Agnostic $50-75k 137 #> 7 Agnostic $75-100k 122 #> 8 Agnostic $100-150k 109 #> 9 Agnostic >150k 84 #> 10 Agnostic Don't know/refused 96 #> # ℹ 170 more rows"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider.html","id":null,"dir":"Reference","previous_headings":"","what":"Pivot data from long to wide — pivot_wider","title":"Pivot data from long to wide — pivot_wider","text":"pivot_wider() \"widens\" data, increasing number columns decreasing number rows. inverse transformation pivot_longer(). Learn vignette(\"pivot\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pivot data from long to wide — pivot_wider","text":"","code":"pivot_wider( data, ..., id_cols = NULL, id_expand = FALSE, names_from = name, names_prefix = \"\", names_sep = \"_\", names_glue = NULL, names_sort = FALSE, names_vary = \"fastest\", names_expand = FALSE, names_repair = \"check_unique\", values_from = value, values_fill = NULL, values_fn = NULL, unused_fn = NULL )"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pivot data from long to wide — pivot_wider","text":"data data frame pivot. ... Additional arguments passed methods. id_cols set columns uniquely identify observation. Typically used redundant variables, .e. variables whose values perfectly correlated existing variables. Defaults columns data except columns specified names_from values_from. tidyselect expression supplied, evaluated data removing columns specified names_from values_from. id_expand values id_cols columns expanded expand() pivoting? results rows, output contain complete expansion possible values id_cols. Implicit factor levels represented data become explicit. Additionally, row values corresponding expanded id_cols sorted. names_from, values_from pair arguments describing column (columns) get name output column (names_from), column (columns) get cell values (values_from). values_from contains multiple values, value added front output column. names_prefix String added start every variable name. particularly useful names_from numeric vector want create syntactic variable names. names_sep names_from values_from contains multiple variables, used join values together single string use column name. names_glue Instead names_sep names_prefix, can supply glue specification uses names_from columns (special .value) create custom column names. names_sort column names sorted? FALSE, default, column names ordered first appearance. names_vary names_from identifies column (columns) multiple unique values, multiple values_from columns provided, order resulting column names combined? \"fastest\" varies names_from values fastest, resulting column naming scheme form: value1_name1, value1_name2, value2_name1, value2_name2. default. \"slowest\" varies names_from values slowest, resulting column naming scheme form: value1_name1, value2_name1, value1_name2, value2_name2. names_expand values names_from columns expanded expand() pivoting? results columns, output contain column names corresponding complete expansion possible values names_from. Implicit factor levels represented data become explicit. Additionally, column names sorted, identical names_sort produce. names_repair happens output invalid column names? default, \"check_unique\" error columns duplicated. Use \"minimal\" allow duplicates output, \"unique\" de-duplicated adding numeric suffixes. See vctrs::vec_as_names() options. values_fill Optionally, (scalar) value specifies value filled missing. can named list want apply different fill values different value columns. values_fn Optionally, function applied value cell output. typically use combination id_cols names_from columns uniquely identify observation. can named list want apply different aggregations different values_from columns. unused_fn Optionally, function applied summarize values unused columns (.e. columns identified id_cols, names_from, values_from). default drops unused columns result. can named list want apply different aggregations different unused columns. id_cols must supplied unused_fn useful, since otherwise unspecified columns considered id_cols. similar grouping id_cols summarizing unused columns using unused_fn.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Pivot data from long to wide — pivot_wider","text":"pivot_wider() updated approach spread(), designed simpler use handle use cases. recommend use pivot_wider() new code; spread() going away longer active development.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Pivot data from long to wide — pivot_wider","text":"","code":"# See vignette(\"pivot\") for examples and explanation fish_encounters #> # A tibble: 114 × 3 #> fish station seen #> #> 1 4842 Release 1 #> 2 4842 I80_1 1 #> 3 4842 Lisbon 1 #> 4 4842 Rstr 1 #> 5 4842 Base_TD 1 #> 6 4842 BCE 1 #> 7 4842 BCW 1 #> 8 4842 BCE2 1 #> 9 4842 BCW2 1 #> 10 4842 MAE 1 #> # ℹ 104 more rows fish_encounters %>% pivot_wider(names_from = station, values_from = seen) #> # A tibble: 19 × 12 #> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE #> #> 1 4842 1 1 1 1 1 1 1 1 1 1 #> 2 4843 1 1 1 1 1 1 1 1 1 1 #> 3 4844 1 1 1 1 1 1 1 1 1 1 #> 4 4845 1 1 1 1 1 NA NA NA NA NA #> 5 4847 1 1 1 NA NA NA NA NA NA NA #> 6 4848 1 1 1 1 NA NA NA NA NA NA #> 7 4849 1 1 NA NA NA NA NA NA NA NA #> 8 4850 1 1 NA 1 1 1 1 NA NA NA #> 9 4851 1 1 NA NA NA NA NA NA NA NA #> 10 4854 1 1 NA NA NA NA NA NA NA NA #> 11 4855 1 1 1 1 1 NA NA NA NA NA #> 12 4857 1 1 1 1 1 1 1 1 1 NA #> 13 4858 1 1 1 1 1 1 1 1 1 1 #> 14 4859 1 1 1 1 1 NA NA NA NA NA #> 15 4861 1 1 1 1 1 1 1 1 1 1 #> 16 4862 1 1 1 1 1 1 1 1 1 NA #> 17 4863 1 1 NA NA NA NA NA NA NA NA #> 18 4864 1 1 NA NA NA NA NA NA NA NA #> 19 4865 1 1 1 NA NA NA NA NA NA NA #> # ℹ 1 more variable: MAW # Fill in missing values fish_encounters %>% pivot_wider(names_from = station, values_from = seen, values_fill = 0) #> # A tibble: 19 × 12 #> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE #> #> 1 4842 1 1 1 1 1 1 1 1 1 1 #> 2 4843 1 1 1 1 1 1 1 1 1 1 #> 3 4844 1 1 1 1 1 1 1 1 1 1 #> 4 4845 1 1 1 1 1 0 0 0 0 0 #> 5 4847 1 1 1 0 0 0 0 0 0 0 #> 6 4848 1 1 1 1 0 0 0 0 0 0 #> 7 4849 1 1 0 0 0 0 0 0 0 0 #> 8 4850 1 1 0 1 1 1 1 0 0 0 #> 9 4851 1 1 0 0 0 0 0 0 0 0 #> 10 4854 1 1 0 0 0 0 0 0 0 0 #> 11 4855 1 1 1 1 1 0 0 0 0 0 #> 12 4857 1 1 1 1 1 1 1 1 1 0 #> 13 4858 1 1 1 1 1 1 1 1 1 1 #> 14 4859 1 1 1 1 1 0 0 0 0 0 #> 15 4861 1 1 1 1 1 1 1 1 1 1 #> 16 4862 1 1 1 1 1 1 1 1 1 0 #> 17 4863 1 1 0 0 0 0 0 0 0 0 #> 18 4864 1 1 0 0 0 0 0 0 0 0 #> 19 4865 1 1 1 0 0 0 0 0 0 0 #> # ℹ 1 more variable: MAW # Generate column names from multiple variables us_rent_income #> # A tibble: 104 × 5 #> GEOID NAME variable estimate moe #> #> 1 01 Alabama income 24476 136 #> 2 01 Alabama rent 747 3 #> 3 02 Alaska income 32940 508 #> 4 02 Alaska rent 1200 13 #> 5 04 Arizona income 27517 148 #> 6 04 Arizona rent 972 4 #> 7 05 Arkansas income 23789 165 #> 8 05 Arkansas rent 709 5 #> 9 06 California income 29454 109 #> 10 06 California rent 1358 3 #> # ℹ 94 more rows us_rent_income %>% pivot_wider( names_from = variable, values_from = c(estimate, moe) ) #> # A tibble: 52 × 6 #> GEOID NAME estimate_income estimate_rent moe_income moe_rent #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Co… 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows # You can control whether `names_from` values vary fastest or slowest # relative to the `values_from` column names using `names_vary`. us_rent_income %>% pivot_wider( names_from = variable, values_from = c(estimate, moe), names_vary = \"slowest\" ) #> # A tibble: 52 × 6 #> GEOID NAME estimate_income moe_income estimate_rent moe_rent #> #> 1 01 Alabama 24476 136 747 3 #> 2 02 Alaska 32940 508 1200 13 #> 3 04 Arizona 27517 148 972 4 #> 4 05 Arkansas 23789 165 709 5 #> 5 06 California 29454 109 1358 3 #> 6 08 Colorado 32401 109 1125 5 #> 7 09 Connecticut 35326 195 1123 5 #> 8 10 Delaware 31560 247 1076 10 #> 9 11 District of Co… 43198 681 1424 17 #> 10 12 Florida 25952 70 1077 3 #> # ℹ 42 more rows # When there are multiple `names_from` or `values_from`, you can use # use `names_sep` or `names_glue` to control the output variable names us_rent_income %>% pivot_wider( names_from = variable, names_sep = \".\", values_from = c(estimate, moe) ) #> # A tibble: 52 × 6 #> GEOID NAME estimate.income estimate.rent moe.income moe.rent #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Co… 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows us_rent_income %>% pivot_wider( names_from = variable, names_glue = \"{variable}_{.value}\", values_from = c(estimate, moe) ) #> # A tibble: 52 × 6 #> GEOID NAME income_estimate rent_estimate income_moe rent_moe #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Co… 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows # Can perform aggregation with `values_fn` warpbreaks <- as_tibble(warpbreaks[c(\"wool\", \"tension\", \"breaks\")]) warpbreaks #> # A tibble: 54 × 3 #> wool tension breaks #> #> 1 A L 26 #> 2 A L 30 #> 3 A L 54 #> 4 A L 25 #> 5 A L 70 #> 6 A L 52 #> 7 A L 51 #> 8 A L 26 #> 9 A L 67 #> 10 A M 18 #> # ℹ 44 more rows warpbreaks %>% pivot_wider( names_from = wool, values_from = breaks, values_fn = mean ) #> # A tibble: 3 × 3 #> tension A B #> #> 1 L 44.6 28.2 #> 2 M 24 28.8 #> 3 H 24.6 18.8 # Can pass an anonymous function to `values_fn` when you # need to supply additional arguments warpbreaks$breaks[1] <- NA warpbreaks %>% pivot_wider( names_from = wool, values_from = breaks, values_fn = ~ mean(.x, na.rm = TRUE) ) #> # A tibble: 3 × 3 #> tension A B #> #> 1 L 46.9 28.2 #> 2 M 24 28.8 #> 3 H 24.6 18.8"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider_spec.html","id":null,"dir":"Reference","previous_headings":"","what":"Pivot data from long to wide using a spec — pivot_wider_spec","title":"Pivot data from long to wide using a spec — pivot_wider_spec","text":"low level interface pivoting, inspired cdata package, allows describe pivoting data frame.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider_spec.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pivot data from long to wide using a spec — pivot_wider_spec","text":"","code":"pivot_wider_spec( data, spec, ..., names_repair = \"check_unique\", id_cols = NULL, id_expand = FALSE, values_fill = NULL, values_fn = NULL, unused_fn = NULL, error_call = current_env() ) build_wider_spec( data, ..., names_from = name, values_from = value, names_prefix = \"\", names_sep = \"_\", names_glue = NULL, names_sort = FALSE, names_vary = \"fastest\", names_expand = FALSE, error_call = current_env() )"},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider_spec.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pivot data from long to wide using a spec — pivot_wider_spec","text":"data data frame pivot. spec specification data frame. useful complex pivots gives greater control metadata stored columns become column names result. Must data frame containing character .name .value columns. Additional columns spec named match columns long format dataset contain values corresponding columns pivoted wide format. special .seq variable used disambiguate rows internally; automatically removed pivoting. ... dots future extensions must empty. names_repair happens output invalid column names? default, \"check_unique\" error columns duplicated. Use \"minimal\" allow duplicates output, \"unique\" de-duplicated adding numeric suffixes. See vctrs::vec_as_names() options. id_cols set columns uniquely identifies observation. Defaults columns data except columns specified spec$.value columns spec named .name .value. Typically used redundant variables, .e. variables whose values perfectly correlated existing variables. id_expand values id_cols columns expanded expand() pivoting? results rows, output contain complete expansion possible values id_cols. Implicit factor levels represented data become explicit. Additionally, row values corresponding expanded id_cols sorted. values_fill Optionally, (scalar) value specifies value filled missing. can named list want apply different fill values different value columns. values_fn Optionally, function applied value cell output. typically use combination id_cols names_from columns uniquely identify observation. can named list want apply different aggregations different values_from columns. unused_fn Optionally, function applied summarize values unused columns (.e. columns identified id_cols, names_from, values_from). default drops unused columns result. can named list want apply different aggregations different unused columns. id_cols must supplied unused_fn useful, since otherwise unspecified columns considered id_cols. similar grouping id_cols summarizing unused columns using unused_fn. error_call execution environment currently running function, e.g. caller_env(). function mentioned error messages source error. See call argument abort() information. names_from, values_from pair arguments describing column (columns) get name output column (names_from), column (columns) get cell values (values_from). values_from contains multiple values, value added front output column. names_prefix String added start every variable name. particularly useful names_from numeric vector want create syntactic variable names. names_sep names_from values_from contains multiple variables, used join values together single string use column name. names_glue Instead names_sep names_prefix, can supply glue specification uses names_from columns (special .value) create custom column names. names_sort column names sorted? FALSE, default, column names ordered first appearance. names_vary names_from identifies column (columns) multiple unique values, multiple values_from columns provided, order resulting column names combined? \"fastest\" varies names_from values fastest, resulting column naming scheme form: value1_name1, value1_name2, value2_name1, value2_name2. default. \"slowest\" varies names_from values slowest, resulting column naming scheme form: value1_name1, value2_name1, value1_name2, value2_name2. names_expand values names_from columns expanded expand() pivoting? results columns, output contain column names corresponding complete expansion possible values names_from. Implicit factor levels represented data become explicit. Additionally, column names sorted, identical names_sort produce.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/pivot_wider_spec.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Pivot data from long to wide using a spec — pivot_wider_spec","text":"","code":"# See vignette(\"pivot\") for examples and explanation us_rent_income #> # A tibble: 104 × 5 #> GEOID NAME variable estimate moe #> #> 1 01 Alabama income 24476 136 #> 2 01 Alabama rent 747 3 #> 3 02 Alaska income 32940 508 #> 4 02 Alaska rent 1200 13 #> 5 04 Arizona income 27517 148 #> 6 04 Arizona rent 972 4 #> 7 05 Arkansas income 23789 165 #> 8 05 Arkansas rent 709 5 #> 9 06 California income 29454 109 #> 10 06 California rent 1358 3 #> # ℹ 94 more rows spec1 <- us_rent_income %>% build_wider_spec(names_from = variable, values_from = c(estimate, moe)) spec1 #> # A tibble: 4 × 3 #> .name .value variable #> #> 1 estimate_income estimate income #> 2 estimate_rent estimate rent #> 3 moe_income moe income #> 4 moe_rent moe rent us_rent_income %>% pivot_wider_spec(spec1) #> # A tibble: 52 × 6 #> GEOID NAME estimate_income estimate_rent moe_income moe_rent #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Co… 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows # Is equivalent to us_rent_income %>% pivot_wider(names_from = variable, values_from = c(estimate, moe)) #> # A tibble: 52 × 6 #> GEOID NAME estimate_income estimate_rent moe_income moe_rent #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Co… 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows # `pivot_wider_spec()` provides more control over column names and output format # instead of creating columns with estimate_ and moe_ prefixes, # keep original variable name for estimates and attach _moe as suffix spec2 <- tibble( .name = c(\"income\", \"rent\", \"income_moe\", \"rent_moe\"), .value = c(\"estimate\", \"estimate\", \"moe\", \"moe\"), variable = c(\"income\", \"rent\", \"income\", \"rent\") ) us_rent_income %>% pivot_wider_spec(spec2) #> # A tibble: 52 × 6 #> GEOID NAME income rent income_moe rent_moe #> #> 1 01 Alabama 24476 747 136 3 #> 2 02 Alaska 32940 1200 508 13 #> 3 04 Arizona 27517 972 148 4 #> 4 05 Arkansas 23789 709 165 5 #> 5 06 California 29454 1358 109 3 #> 6 08 Colorado 32401 1125 109 5 #> 7 09 Connecticut 35326 1123 195 5 #> 8 10 Delaware 31560 1076 247 10 #> 9 11 District of Columbia 43198 1424 681 17 #> 10 12 Florida 25952 1077 70 3 #> # ℹ 42 more rows"},{"path":"https://tidyr.tidyverse.org/dev/reference/reexports.html","id":null,"dir":"Reference","previous_headings":"","what":"Objects exported from other packages — reexports","title":"Objects exported from other packages — reexports","text":"objects imported packages. Follow links see documentation. tibble as_tibble, tibble, tribble tidyselect all_of, any_of, contains, ends_with, everything, last_col, matches, num_range, one_of, starts_with","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/relig_income.html","id":null,"dir":"Reference","previous_headings":"","what":"Pew religion and income survey — relig_income","title":"Pew religion and income survey — relig_income","text":"Pew religion income survey","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/relig_income.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pew religion and income survey — relig_income","text":"","code":"relig_income"},{"path":"https://tidyr.tidyverse.org/dev/reference/relig_income.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Pew religion and income survey — relig_income","text":"dataset variables: religion Name religion <$10k-Don\\'t know/refused Number respondees income range column name","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/relig_income.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Pew religion and income survey — relig_income","text":"Downloaded https://www.pewresearch.org/religious-landscape-study/database/ (downloaded November 2009)","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/replace_na.html","id":null,"dir":"Reference","previous_headings":"","what":"Replace NAs with specified values — replace_na","title":"Replace NAs with specified values — replace_na","text":"Replace NAs specified values","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/replace_na.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Replace NAs with specified values — replace_na","text":"","code":"replace_na(data, replace, ...)"},{"path":"https://tidyr.tidyverse.org/dev/reference/replace_na.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Replace NAs with specified values — replace_na","text":"data data frame vector. replace data data frame, replace takes named list values, one value column missing values replaced. value replace cast type column data used replacement . data vector, replace takes single value. single value replaces missing values vector. replace cast type data. ... Additional arguments methods. Currently unused.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/replace_na.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Replace NAs with specified values — replace_na","text":"replace_na() returns object type data.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/replace_na.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Replace NAs with specified values — replace_na","text":"","code":"# Replace NAs in a data frame df <- tibble(x = c(1, 2, NA), y = c(\"a\", NA, \"b\")) df %>% replace_na(list(x = 0, y = \"unknown\")) #> # A tibble: 3 × 2 #> x y #> #> 1 1 a #> 2 2 unknown #> 3 0 b # Replace NAs in a vector df %>% dplyr::mutate(x = replace_na(x, 0)) #> # A tibble: 3 × 2 #> x y #> #> 1 1 a #> 2 2 NA #> 3 0 b # OR df$x %>% replace_na(0) #> [1] 1 2 0 df$y %>% replace_na(\"unknown\") #> [1] \"a\" \"unknown\" \"b\" # Replace NULLs in a list: NULLs are the list-col equivalent of NAs df_list <- tibble(z = list(1:5, NULL, 10:20)) df_list %>% replace_na(list(z = list(5))) #> # A tibble: 3 × 1 #> z #> #> 1 #> 2 #> 3 "},{"path":"https://tidyr.tidyverse.org/dev/reference/separate.html","id":null,"dir":"Reference","previous_headings":"","what":"Separate a character column into multiple columns with a regular expression or numeric locations — separate","title":"Separate a character column into multiple columns with a regular expression or numeric locations — separate","text":"separate() superseded favour separate_wider_position() separate_wider_delim() two functions make two uses obvious, API polished, handling problems better. Superseded functions go away, receive critical bug fixes. Given either regular expression vector character positions, separate() turns single character column multiple columns.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Separate a character column into multiple columns with a regular expression or numeric locations — separate","text":"","code":"separate( data, col, into, sep = \"[^[:alnum:]]+\", remove = TRUE, convert = FALSE, extra = \"warn\", fill = \"warn\", ... )"},{"path":"https://tidyr.tidyverse.org/dev/reference/separate.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Separate a character column into multiple columns with a regular expression or numeric locations — separate","text":"data data frame. col Column expand. Names new variables create character vector. Use NA omit variable output. sep Separator columns. character, sep interpreted regular expression. default value regular expression matches sequence non-alphanumeric values. numeric, sep interpreted character positions split . Positive values start 1 far-left string; negative value start -1 far-right string. length sep one less . remove TRUE, remove input column output data frame. convert TRUE, run type.convert() .= TRUE new columns. useful component columns integer, numeric logical. NB: cause string \"NA\"s converted NAs. extra sep character vector, controls happens many pieces. three valid options: \"warn\" (default): emit warning drop extra values. \"drop\": drop extra values without warning. \"merge\": splits length() times fill sep character vector, controls happens enough pieces. three valid options: \"warn\" (default): emit warning fill right \"right\": fill missing values right \"left\": fill missing values left ... Additional arguments passed methods.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/separate.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Separate a character column into multiple columns with a regular expression or numeric locations — separate","text":"","code":"# If you want to split by any non-alphanumeric value (the default): df <- tibble(x = c(NA, \"x.y\", \"x.z\", \"y.z\")) df %>% separate(x, c(\"A\", \"B\")) #> # A tibble: 4 × 2 #> A B #> #> 1 NA NA #> 2 x y #> 3 x z #> 4 y z # If you just want the second variable: df %>% separate(x, c(NA, \"B\")) #> # A tibble: 4 × 1 #> B #> #> 1 NA #> 2 y #> 3 z #> 4 z # We now recommend separate_wider_delim() instead: df %>% separate_wider_delim(x, \".\", names = c(\"A\", \"B\")) #> # A tibble: 4 × 2 #> A B #> #> 1 NA NA #> 2 x y #> 3 x z #> 4 y z df %>% separate_wider_delim(x, \".\", names = c(NA, \"B\")) #> # A tibble: 4 × 1 #> B #> #> 1 NA #> 2 y #> 3 z #> 4 z # Controlling uneven splits ------------------------------------------------- # If every row doesn't split into the same number of pieces, use # the extra and fill arguments to control what happens: df <- tibble(x = c(\"x\", \"x y\", \"x y z\", NA)) df %>% separate(x, c(\"a\", \"b\")) #> Warning: Expected 2 pieces. Additional pieces discarded in 1 rows [3]. #> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [1]. #> # A tibble: 4 × 2 #> a b #> #> 1 x NA #> 2 x y #> 3 x y #> 4 NA NA # The same behaviour as previous, but drops the c without warnings: df %>% separate(x, c(\"a\", \"b\"), extra = \"drop\", fill = \"right\") #> # A tibble: 4 × 2 #> a b #> #> 1 x NA #> 2 x y #> 3 x y #> 4 NA NA # Opposite of previous, keeping the c and filling left: df %>% separate(x, c(\"a\", \"b\"), extra = \"merge\", fill = \"left\") #> # A tibble: 4 × 2 #> a b #> #> 1 NA x #> 2 x y #> 3 x y z #> 4 NA NA # Or you can keep all three: df %>% separate(x, c(\"a\", \"b\", \"c\")) #> Warning: Expected 3 pieces. Missing pieces filled with `NA` in 2 rows [1, 2]. #> # A tibble: 4 × 3 #> a b c #> #> 1 x NA NA #> 2 x y NA #> 3 x y z #> 4 NA NA NA # To only split a specified number of times use extra = \"merge\": df <- tibble(x = c(\"x: 123\", \"y: error: 7\")) df %>% separate(x, c(\"key\", \"value\"), \": \", extra = \"merge\") #> # A tibble: 2 × 2 #> key value #> #> 1 x 123 #> 2 y error: 7 # Controlling column types -------------------------------------------------- # convert = TRUE detects column classes: df <- tibble(x = c(\"x:1\", \"x:2\", \"y:4\", \"z\", NA)) df %>% separate(x, c(\"key\", \"value\"), \":\") %>% str() #> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [4]. #> tibble [5 × 2] (S3: tbl_df/tbl/data.frame) #> $ key : chr [1:5] \"x\" \"x\" \"y\" \"z\" ... #> $ value: chr [1:5] \"1\" \"2\" \"4\" NA ... df %>% separate(x, c(\"key\", \"value\"), \":\", convert = TRUE) %>% str() #> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [4]. #> tibble [5 × 2] (S3: tbl_df/tbl/data.frame) #> $ key : chr [1:5] \"x\" \"x\" \"y\" \"z\" ... #> $ value: int [1:5] 1 2 4 NA NA"},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_longer_delim.html","id":null,"dir":"Reference","previous_headings":"","what":"Split a string into rows — separate_longer_delim","title":"Split a string into rows — separate_longer_delim","text":"functions takes string splits multiple rows: separate_longer_delim() splits delimiter. separate_longer_position() splits fixed width.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_longer_delim.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Split a string into rows — separate_longer_delim","text":"","code":"separate_longer_delim(data, cols, delim, ...) separate_longer_position(data, cols, width, ..., keep_empty = FALSE)"},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_longer_delim.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Split a string into rows — separate_longer_delim","text":"data data frame. cols Columns separate. delim separate_longer_delim(), string giving delimiter values. default, interpreted fixed string; use stringr::regex() friends split ways. ... dots future extensions must empty. width separate_longer_position(), integer giving number characters split . keep_empty default, get ceiling(nchar(x) / width) rows observation. nchar(x) zero, means entire input row dropped output. want preserve rows, use keep_empty = TRUE replace size-0 elements missing value.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_longer_delim.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Split a string into rows — separate_longer_delim","text":"data frame based data. columns, different rows.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_longer_delim.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Split a string into rows — separate_longer_delim","text":"","code":"df <- tibble(id = 1:4, x = c(\"x\", \"x y\", \"x y z\", NA)) df %>% separate_longer_delim(x, delim = \" \") #> # A tibble: 7 × 2 #> id x #> #> 1 1 x #> 2 2 x #> 3 2 y #> 4 3 x #> 5 3 y #> 6 3 z #> 7 4 NA # You can separate multiple columns at once if they have the same structure df <- tibble(id = 1:3, x = c(\"x\", \"x y\", \"x y z\"), y = c(\"a\", \"a b\", \"a b c\")) df %>% separate_longer_delim(c(x, y), delim = \" \") #> # A tibble: 6 × 3 #> id x y #> #> 1 1 x a #> 2 2 x a #> 3 2 y b #> 4 3 x a #> 5 3 y b #> 6 3 z c # Or instead split by a fixed length df <- tibble(id = 1:3, x = c(\"ab\", \"def\", \"\")) df %>% separate_longer_position(x, 1) #> # A tibble: 5 × 2 #> id x #> #> 1 1 a #> 2 1 b #> 3 2 d #> 4 2 e #> 5 2 f df %>% separate_longer_position(x, 2) #> # A tibble: 3 × 2 #> id x #> #> 1 1 ab #> 2 2 de #> 3 2 f df %>% separate_longer_position(x, 2, keep_empty = TRUE) #> # A tibble: 4 × 2 #> id x #> #> 1 1 ab #> 2 2 de #> 3 2 f #> 4 3 NA"},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_rows.html","id":null,"dir":"Reference","previous_headings":"","what":"Separate a collapsed column into multiple rows — separate_rows","title":"Separate a collapsed column into multiple rows — separate_rows","text":"separate_rows() superseded favour separate_longer_delim() consistent API separate functions. Superseded functions go away, receive critical bug fixes. variable contains observations multiple delimited values, separate_rows() separates values places one row.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_rows.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Separate a collapsed column into multiple rows — separate_rows","text":"","code":"separate_rows(data, ..., sep = \"[^[:alnum:].]+\", convert = FALSE)"},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_rows.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Separate a collapsed column into multiple rows — separate_rows","text":"data data frame. ... Columns separate across multiple rows sep Separator delimiting collapsed values. convert TRUE automatically run type.convert() key column. useful column types actually numeric, integer, logical.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_rows.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Separate a collapsed column into multiple rows — separate_rows","text":"","code":"df <- tibble( x = 1:3, y = c(\"a\", \"d,e,f\", \"g,h\"), z = c(\"1\", \"2,3,4\", \"5,6\") ) separate_rows(df, y, z, convert = TRUE) #> # A tibble: 6 × 3 #> x y z #> #> 1 1 a 1 #> 2 2 d 2 #> 3 2 e 3 #> 4 2 f 4 #> 5 3 g 5 #> 6 3 h 6 # Now recommended df %>% separate_longer_delim(c(y, z), delim = \",\") #> # A tibble: 6 × 3 #> x y z #> #> 1 1 a 1 #> 2 2 d 2 #> 3 2 e 3 #> 4 2 f 4 #> 5 3 g 5 #> 6 3 h 6"},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_wider_delim.html","id":null,"dir":"Reference","previous_headings":"","what":"Split a string into columns — separate_wider_delim","title":"Split a string into columns — separate_wider_delim","text":"functions takes string column splits multiple new columns: separate_wider_delim() splits delimiter. separate_wider_position() splits fixed widths. separate_wider_regex() splits regular expression matches. functions equivalent separate() extract(), use stringr underlying string manipulation engine, interfaces reflect learned unnest_wider() unnest_longer().","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_wider_delim.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Split a string into columns — separate_wider_delim","text":"","code":"separate_wider_delim( data, cols, delim, ..., names = NULL, names_sep = NULL, names_repair = \"check_unique\", too_few = c(\"error\", \"debug\", \"align_start\", \"align_end\"), too_many = c(\"error\", \"debug\", \"drop\", \"merge\"), cols_remove = TRUE ) separate_wider_position( data, cols, widths, ..., names_sep = NULL, names_repair = \"check_unique\", too_few = c(\"error\", \"debug\", \"align_start\"), too_many = c(\"error\", \"debug\", \"drop\"), cols_remove = TRUE ) separate_wider_regex( data, cols, patterns, ..., names_sep = NULL, names_repair = \"check_unique\", too_few = c(\"error\", \"debug\", \"align_start\"), cols_remove = TRUE )"},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_wider_delim.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Split a string into columns — separate_wider_delim","text":"data data frame. cols Columns separate. delim separate_wider_delim(), string giving delimiter values. default, interpreted fixed string; use stringr::regex() friends split ways. ... dots future extensions must empty. names separate_wider_delim(), character vector output column names. Use NA components want appear output; number non-NA elements determines number new columns result. names_sep supplied, output names composed input column name followed separator followed new column name. Required cols selects multiple columns. separate_wider_delim() can specify instead names, case names generated source column name, names_sep, numeric suffix. names_repair Used check output data frame valid names. Must one following options: \"minimal\": name repair checks, beyond basic existence, \"unique\": make sure names unique empty, \"check_unique\": (default), name repair, check unique, \"universal\": make names unique syntactic function: apply custom name repair. tidyr_legacy: use name repair tidyr 0.8. formula: purrr-style anonymous function (see rlang::as_function()) See vctrs::vec_as_names() details terms strategies used enforce . too_few happen value separates pieces? \"error\", default, throw error. \"debug\" adds additional columns output help locate resolve underlying problem. option intended help debug issue address generally remain final code. \"align_start\" aligns starts short matches, adding NA end pad correct length. \"align_end\" (separate_wider_delim() ) aligns ends short matches, adding NA start pad correct length. too_many happen value separates many pieces? \"error\", default, throw error. \"debug\" add additional columns output help locate resolve underlying problem. \"drop\" silently drop extra pieces. \"merge\" (separate_wider_delim() ) merge together additional pieces. cols_remove input cols removed output? Always FALSE too_few too_many set \"debug\". widths named numeric vector names become column names, values specify column width. Unnamed components match, included output. patterns named character vector names become column names values regular expressions match contents vector. Unnamed components match, included output.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_wider_delim.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Split a string into columns — separate_wider_delim","text":"data frame based data. rows, different columns: primary purpose functions create new columns components string. separate_wider_delim() names new columns come names. separate_wider_position() names come names widths. separate_wider_regex() names come names patterns. too_few too_many \"debug\", output contain additional columns useful debugging: {col}_ok: logical vector tells input ok . Use quickly find problematic rows. {col}_remainder: text remaining separation. {col}_pieces, {col}_width, {col}_matches: number pieces, number characters, number matches separate_wider_delim(), separate_wider_position() separate_regexp_wider() respectively. cols_remove = TRUE (default), input cols removed output.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/separate_wider_delim.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Split a string into columns — separate_wider_delim","text":"","code":"df <- tibble(id = 1:3, x = c(\"m-123\", \"f-455\", \"f-123\")) # There are three basic ways to split up a string into pieces: # 1. with a delimiter df %>% separate_wider_delim(x, delim = \"-\", names = c(\"gender\", \"unit\")) #> # A tibble: 3 × 3 #> id gender unit #> #> 1 1 m 123 #> 2 2 f 455 #> 3 3 f 123 # 2. by length df %>% separate_wider_position(x, c(gender = 1, 1, unit = 3)) #> # A tibble: 3 × 3 #> id gender unit #> #> 1 1 m 123 #> 2 2 f 455 #> 3 3 f 123 # 3. defining each component with a regular expression df %>% separate_wider_regex(x, c(gender = \".\", \".\", unit = \"\\\\d+\")) #> # A tibble: 3 × 3 #> id gender unit #> #> 1 1 m 123 #> 2 2 f 455 #> 3 3 f 123 # Sometimes you split on the \"last\" delimiter df <- tibble(var = c(\"race_1\", \"race_2\", \"age_bucket_1\", \"age_bucket_2\")) # _delim won't help because it always splits on the first delimiter try(df %>% separate_wider_delim(var, \"_\", names = c(\"var1\", \"var2\"))) #> Error in separate_wider_delim(., var, \"_\", names = c(\"var1\", \"var2\")) : #> Expected 2 pieces in each element of `var`. #> ! 2 values were too long. #> ℹ Use `too_many = \"debug\"` to diagnose the problem. #> ℹ Use `too_many = \"drop\"/\"merge\"` to silence this message. df %>% separate_wider_delim(var, \"_\", names = c(\"var1\", \"var2\"), too_many = \"merge\") #> # A tibble: 4 × 2 #> var1 var2 #> #> 1 race 1 #> 2 race 2 #> 3 age bucket_1 #> 4 age bucket_2 # Instead, you can use _regex df %>% separate_wider_regex(var, c(var1 = \".*\", \"_\", var2 = \".*\")) #> # A tibble: 4 × 2 #> var1 var2 #> #> 1 race 1 #> 2 race 2 #> 3 age_bucket 1 #> 4 age_bucket 2 # this works because * is greedy; you can mimic the _delim behaviour with .*? df %>% separate_wider_regex(var, c(var1 = \".*?\", \"_\", var2 = \".*\")) #> # A tibble: 4 × 2 #> var1 var2 #> #> 1 race 1 #> 2 race 2 #> 3 age bucket_1 #> 4 age bucket_2 # If the number of components varies, it's most natural to split into rows df <- tibble(id = 1:4, x = c(\"x\", \"x y\", \"x y z\", NA)) df %>% separate_longer_delim(x, delim = \" \") #> # A tibble: 7 × 2 #> id x #> #> 1 1 x #> 2 2 x #> 3 2 y #> 4 3 x #> 5 3 y #> 6 3 z #> 7 4 NA # But separate_wider_delim() provides some tools to deal with the problem # The default behaviour tells you that there's a problem try(df %>% separate_wider_delim(x, delim = \" \", names = c(\"a\", \"b\"))) #> Error in separate_wider_delim(., x, delim = \" \", names = c(\"a\", \"b\")) : #> Expected 2 pieces in each element of `x`. #> ! 1 value was too short. #> ℹ Use `too_few = \"debug\"` to diagnose the problem. #> ℹ Use `too_few = \"align_start\"/\"align_end\"` to silence this message. #> ! 1 value was too long. #> ℹ Use `too_many = \"debug\"` to diagnose the problem. #> ℹ Use `too_many = \"drop\"/\"merge\"` to silence this message. # You can get additional insight by using the debug options df %>% separate_wider_delim( x, delim = \" \", names = c(\"a\", \"b\"), too_few = \"debug\", too_many = \"debug\" ) #> Warning: Debug mode activated: adding variables `x_ok`, `x_pieces`, and #> `x_remainder`. #> # A tibble: 4 × 7 #> id a b x x_ok x_pieces x_remainder #> #> 1 1 x NA x FALSE 1 \"\" #> 2 2 x y x y TRUE 2 \"\" #> 3 3 x y x y z FALSE 3 \" z\" #> 4 4 NA NA NA TRUE NA NA # But you can suppress the warnings df %>% separate_wider_delim( x, delim = \" \", names = c(\"a\", \"b\"), too_few = \"align_start\", too_many = \"merge\" ) #> # A tibble: 4 × 3 #> id a b #> #> 1 1 x NA #> 2 2 x y #> 3 3 x y z #> 4 4 NA NA # Or choose to automatically name the columns, producing as many as needed df %>% separate_wider_delim(x, delim = \" \", names_sep = \"\", too_few = \"align_start\") #> # A tibble: 4 × 4 #> id x1 x2 x3 #> #> 1 1 x NA NA #> 2 2 x y NA #> 3 3 x y z #> 4 4 NA NA NA"},{"path":"https://tidyr.tidyverse.org/dev/reference/smiths.html","id":null,"dir":"Reference","previous_headings":"","what":"Some data about the Smith family — smiths","title":"Some data about the Smith family — smiths","text":"small demo dataset describing John Mary Smith.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/smiths.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Some data about the Smith family — smiths","text":"","code":"smiths"},{"path":"https://tidyr.tidyverse.org/dev/reference/smiths.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Some data about the Smith family — smiths","text":"data frame 2 rows 5 columns.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/spread.html","id":null,"dir":"Reference","previous_headings":"","what":"Spread a key-value pair across multiple columns — spread","title":"Spread a key-value pair across multiple columns — spread","text":"Development spread() complete, new code recommend switching pivot_wider(), easier use, featureful, still active development. df %>% spread(key, value) equivalent df %>% pivot_wider(names_from = key, values_from = value) See details vignette(\"pivot\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/spread.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Spread a key-value pair across multiple columns — spread","text":"","code":"spread(data, key, value, fill = NA, convert = FALSE, drop = TRUE, sep = NULL)"},{"path":"https://tidyr.tidyverse.org/dev/reference/spread.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Spread a key-value pair across multiple columns — spread","text":"data data frame. key, value Columns use key value. fill set, missing values replaced value. Note two types missingness input: explicit missing values (.e. NA), implicit missings, rows simply present. types missing value replaced fill. convert TRUE, type.convert() asis = TRUE run new columns. useful value column mix variables coerced string. class value column factor date, note true new columns produced, coerced character type conversion. drop FALSE, keep factor levels appear data, filling missing combinations fill. sep NULL, column names taken values key variable. non-NULL, column names given \"\".","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/spread.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Spread a key-value pair across multiple columns — spread","text":"","code":"stocks <- tibble( time = as.Date(\"2009-01-01\") + 0:9, X = rnorm(10, 0, 1), Y = rnorm(10, 0, 2), Z = rnorm(10, 0, 4) ) stocksm <- stocks %>% gather(stock, price, -time) stocksm %>% spread(stock, price) #> # A tibble: 10 × 4 #> time X Y Z #> #> 1 2009-01-01 -2.05 -1.40 0.00192 #> 2 2009-01-02 0.151 1.95 3.02 #> 3 2009-01-03 -0.293 -0.154 1.37 #> 4 2009-01-04 0.255 1.79 0.674 #> 5 2009-01-05 -0.553 -1.56 5.59 #> 6 2009-01-06 1.41 0.874 -2.72 #> 7 2009-01-07 -0.795 0.827 2.95 #> 8 2009-01-08 -1.57 1.95 -3.44 #> 9 2009-01-09 -1.04 2.29 1.68 #> 10 2009-01-10 1.02 2.43 5.80 stocksm %>% spread(time, price) #> # A tibble: 3 × 11 #> stock `2009-01-01` `2009-01-02` `2009-01-03` `2009-01-04` `2009-01-05` #> #> 1 X -2.05 0.151 -0.293 0.255 -0.553 #> 2 Y -1.40 1.95 -0.154 1.79 -1.56 #> 3 Z 0.00192 3.02 1.37 0.674 5.59 #> # ℹ 5 more variables: `2009-01-06` , `2009-01-07` , #> # `2009-01-08` , `2009-01-09` , `2009-01-10` # Spread and gather are complements df <- tibble(x = c(\"a\", \"b\"), y = c(3, 4), z = c(5, 6)) df %>% spread(x, y) %>% gather(\"x\", \"y\", a:b, na.rm = TRUE) #> # A tibble: 2 × 3 #> z x y #> #> 1 5 a 3 #> 2 6 b 4 # Use 'convert = TRUE' to produce variables of mixed type df <- tibble( row = rep(c(1, 51), each = 3), var = rep(c(\"Sepal.Length\", \"Species\", \"Species_num\"), 2), value = c(5.1, \"setosa\", 1, 7.0, \"versicolor\", 2) ) df %>% spread(var, value) %>% str() #> tibble [2 × 4] (S3: tbl_df/tbl/data.frame) #> $ row : num [1:2] 1 51 #> $ Sepal.Length: chr [1:2] \"5.1\" \"7\" #> $ Species : chr [1:2] \"setosa\" \"versicolor\" #> $ Species_num : chr [1:2] \"1\" \"2\" df %>% spread(var, value, convert = TRUE) %>% str() #> tibble [2 × 4] (S3: tbl_df/tbl/data.frame) #> $ row : num [1:2] 1 51 #> $ Sepal.Length: num [1:2] 5.1 7 #> $ Species : chr [1:2] \"setosa\" \"versicolor\" #> $ Species_num : int [1:2] 1 2"},{"path":"https://tidyr.tidyverse.org/dev/reference/table1.html","id":null,"dir":"Reference","previous_headings":"","what":"Example tabular representations — table1","title":"Example tabular representations — table1","text":"Data sets demonstrate multiple ways layout tabular data.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/table1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Example tabular representations — table1","text":"","code":"table1 table2 table3 table4a table4b table5"},{"path":"https://tidyr.tidyverse.org/dev/reference/table1.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Example tabular representations — table1","text":"https://www..int/teams/global-tuberculosis-programme/data","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/table1.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Example tabular representations — table1","text":"table1, table2, table3, table4a, table4b, table5 display number TB cases documented World Health Organization Afghanistan, Brazil, China 1999 2000. data contains values associated four variables (country, year, cases, population), table organizes values different layout. data subset data contained World Health Organization Global Tuberculosis Report","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr-package.html","id":null,"dir":"Reference","previous_headings":"","what":"tidyr: Tidy Messy Data — tidyr-package","title":"tidyr: Tidy Messy Data — tidyr-package","text":"Tools help create tidy data, column variable, row observation, cell contains single value. 'tidyr' contains tools changing shape (pivoting) hierarchy (nesting 'unnesting') dataset, turning deeply nested lists rectangular data frames ('rectangling'), extracting values string columns. also includes tools working missing values (implicit explicit).","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"tidyr: Tidy Messy Data — tidyr-package","text":"Maintainer: Hadley Wickham hadley@posit.co Authors: Davis Vaughan davis@posit.co Maximilian Girlich contributors: Kevin Ushey kevin@posit.co [contributor] Posit Software, PBC [copyright holder, funder]","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_data_masking.html","id":null,"dir":"Reference","previous_headings":"","what":"Argument type: data-masking — tidyr_data_masking","title":"Argument type: data-masking — tidyr_data_masking","text":"page describes argument modifier indicates argument uses data masking, sub-type tidy evaluation. never heard tidy evaluation , start practical introduction https://r4ds.hadley.nz/functions.html#data-frame-functions read underlying theory https://rlang.r-lib.org/reference/topic-data-mask.html.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_data_masking.html","id":"key-techniques","dir":"Reference","previous_headings":"","what":"Key techniques","title":"Argument type: data-masking — tidyr_data_masking","text":"allow user supply column name function argument, embrace argument, e.g. filter(df, {{ var }}). work column name recorded string, use .data pronoun, e.g. summarise(df, mean = mean(.data[[var]])). suppress R CMD check NOTEs unknown variables use .data$var instead var: also need import .data rlang (e.g.) @importFrom rlang .data.","code":"dist_summary <- function(df, var) { df %>% summarise(n = n(), min = min({{ var }}), max = max({{ var }})) } mtcars %>% dist_summary(mpg) mtcars %>% group_by(cyl) %>% dist_summary(mpg) for (var in names(mtcars)) { mtcars %>% count(.data[[var]]) %>% print() } lapply(names(mtcars), function(var) mtcars %>% count(.data[[var]])) # has NOTE df %>% mutate(z = x + y) # no NOTE df %>% mutate(z = .data$x + .data$y)"},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_data_masking.html","id":"dot-dot-dot-","dir":"Reference","previous_headings":"","what":"Dot-dot-dot (...)","title":"Argument type: data-masking — tidyr_data_masking","text":"... automatically provides indirection, can use (.e. without embracing) inside function: can also use := instead = enable glue-like syntax creating variables user supplied data: Learn https://rlang.r-lib.org/reference/topic-data-mask-programming.html.","code":"grouped_mean <- function(df, var, ...) { df %>% group_by(...) %>% summarise(mean = mean({{ var }})) } var_name <- \"l100km\" mtcars %>% mutate(\"{var_name}\" := 235 / mpg) summarise_mean <- function(df, var) { df %>% summarise(\"mean_of_{{var}}\" := mean({{ var }})) } mtcars %>% group_by(cyl) %>% summarise_mean(mpg)"},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_legacy.html","id":null,"dir":"Reference","previous_headings":"","what":"Legacy name repair — tidyr_legacy","title":"Legacy name repair — tidyr_legacy","text":"Ensures column names unique using approach found tidyr 0.8.3 earlier. use function want preserve naming strategy, otherwise better adopting new tidyverse standard name_repair = \"universal\"","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_legacy.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Legacy name repair — tidyr_legacy","text":"","code":"tidyr_legacy(nms, prefix = \"V\", sep = \"\")"},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_legacy.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Legacy name repair — tidyr_legacy","text":"nms Character vector names prefix prefix Prefix use unnamed column sep Separator use name unique suffix","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_legacy.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Legacy name repair — tidyr_legacy","text":"","code":"df <- tibble(x = 1:2, y = list(tibble(x = 3:5), tibble(x = 4:7))) # Doesn't work because it would produce a data frame with two # columns called x if (FALSE) { # \\dontrun{ unnest(df, y) } # } # The new tidyverse standard: unnest(df, y, names_repair = \"universal\") #> New names: #> • `x` -> `x...1` #> • `x` -> `x...2` #> # A tibble: 7 × 2 #> x...1 x...2 #> #> 1 1 3 #> 2 1 4 #> 3 1 5 #> 4 2 4 #> 5 2 5 #> 6 2 6 #> 7 2 7 # The old tidyr approach unnest(df, y, names_repair = tidyr_legacy) #> # A tibble: 7 × 2 #> x x1 #> #> 1 1 3 #> 2 1 4 #> 3 1 5 #> 4 2 4 #> 5 2 5 #> 6 2 6 #> 7 2 7"},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_tidy_select.html","id":null,"dir":"Reference","previous_headings":"","what":"Argument type: tidy-select — tidyr_tidy_select","title":"Argument type: tidy-select — tidyr_tidy_select","text":"page describes argument modifier indicates argument uses tidy selection, sub-type tidy evaluation. never heard tidy evaluation , start practical introduction https://r4ds.hadley.nz/functions.html#data-frame-functions read underlying theory https://rlang.r-lib.org/reference/topic-data-mask.html.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_tidy_select.html","id":"overview-of-selection-features","dir":"Reference","previous_headings":"","what":"Overview of selection features","title":"Argument type: tidy-select — tidyr_tidy_select","text":"tidyselect implements DSL selecting variables. provides helpers selecting variables: var1:var10: variables lying var1 left var10 right. starts_with(\"\"): names start \"\". ends_with(\"z\"): names end \"z\". contains(\"b\"): names contain \"b\". matches(\"x.y\"): names match regular expression x.y. num_range(x, 1:4): names following pattern, x1, x2, ..., x4. all_of(vars)/any_of(vars): matches names stored character vector vars. all_of(vars) error variables present; any_of(var) match just variables exist. everything(): variables. last_col(): furthest column right. (.numeric): variables .numeric() returns TRUE. well operators combining selections: !selection: variables match selection. selection1 & selection2: variables included selection1 selection2. selection1 | selection2: variables match either selection1 selection2.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/tidyr_tidy_select.html","id":"key-techniques","dir":"Reference","previous_headings":"","what":"Key techniques","title":"Argument type: tidy-select — tidyr_tidy_select","text":"want user supply tidyselect specification function argument, need tunnel selection function argument. done embracing function argument {{ }}, e.g unnest(df, {{ vars }}). character vector column names, use all_of() any_of(), depending whether want unknown variable names cause error, e.g unnest(df, all_of(vars)), unnest(df, !any_of(vars)). suppress R CMD check NOTEs unknown variables use \"var\" instead var:","code":"# has NOTE df %>% select(x, y, z) # no NOTE df %>% select(\"x\", \"y\", \"z\")"},{"path":"https://tidyr.tidyverse.org/dev/reference/uncount.html","id":null,"dir":"Reference","previous_headings":"","what":"","title":"","text":"Performs opposite operation dplyr::count(), duplicating rows according weighting variable (expression).","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/uncount.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"","text":"","code":"uncount(data, weights, ..., .remove = TRUE, .id = NULL)"},{"path":"https://tidyr.tidyverse.org/dev/reference/uncount.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"","text":"data data frame, tibble, grouped tibble. weights vector weights. Evaluated context data; supports quasiquotation. ... Additional arguments passed methods. .remove TRUE, weights name column data, column removed. .id Supply string create new variable gives unique identifier created row.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/uncount.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"","text":"","code":"df <- tibble(x = c(\"a\", \"b\"), n = c(1, 2)) uncount(df, n) #> # A tibble: 3 × 1 #> x #> #> 1 a #> 2 b #> 3 b uncount(df, n, .id = \"id\") #> # A tibble: 3 × 2 #> x id #> #> 1 a 1 #> 2 b 1 #> 3 b 2 # You can also use constants uncount(df, 2) #> # A tibble: 4 × 2 #> x n #> #> 1 a 1 #> 2 a 1 #> 3 b 2 #> 4 b 2 # Or expressions uncount(df, 2 / n) #> # A tibble: 3 × 2 #> x n #> #> 1 a 1 #> 2 a 1 #> 3 b 2"},{"path":"https://tidyr.tidyverse.org/dev/reference/unite.html","id":null,"dir":"Reference","previous_headings":"","what":"Unite multiple columns into one by pasting strings together — unite","title":"Unite multiple columns into one by pasting strings together — unite","text":"Convenience function paste together multiple columns one.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/unite.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Unite multiple columns into one by pasting strings together — unite","text":"","code":"unite(data, col, ..., sep = \"_\", remove = TRUE, na.rm = FALSE)"},{"path":"https://tidyr.tidyverse.org/dev/reference/unite.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Unite multiple columns into one by pasting strings together — unite","text":"data data frame. col name new column, string symbol. argument passed expression supports quasiquotation (can unquote strings symbols). name captured expression rlang::ensym() (note kind interface symbols represent actual objects now discouraged tidyverse; support backward compatibility). ... Columns unite sep Separator use values. remove TRUE, remove input columns output data frame. na.rm TRUE, missing values removed prior uniting value.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/unite.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Unite multiple columns into one by pasting strings together — unite","text":"","code":"df <- expand_grid(x = c(\"a\", NA), y = c(\"b\", NA)) df #> # A tibble: 4 × 2 #> x y #> #> 1 a b #> 2 a NA #> 3 NA b #> 4 NA NA df %>% unite(\"z\", x:y, remove = FALSE) #> # A tibble: 4 × 3 #> z x y #> #> 1 a_b a b #> 2 a_NA a NA #> 3 NA_b NA b #> 4 NA_NA NA NA # To remove missing values: df %>% unite(\"z\", x:y, na.rm = TRUE, remove = FALSE) #> # A tibble: 4 × 3 #> z x y #> #> 1 \"a_b\" a b #> 2 \"a\" a NA #> 3 \"b\" NA b #> 4 \"\" NA NA # Separate is almost the complement of unite df %>% unite(\"xy\", x:y) %>% separate(xy, c(\"x\", \"y\")) #> # A tibble: 4 × 2 #> x y #> #> 1 a b #> 2 a NA #> 3 NA b #> 4 NA NA # (but note `x` and `y` contain now \"NA\" not NA)"},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest.html","id":null,"dir":"Reference","previous_headings":"","what":"Unnest a list-column of data frames into rows and columns — unnest","title":"Unnest a list-column of data frames into rows and columns — unnest","text":"Unnest expands list-column containing data frames rows columns.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Unnest a list-column of data frames into rows and columns — unnest","text":"","code":"unnest( data, cols, ..., keep_empty = FALSE, ptype = NULL, names_sep = NULL, names_repair = \"check_unique\", .drop = deprecated(), .id = deprecated(), .sep = deprecated(), .preserve = deprecated() )"},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Unnest a list-column of data frames into rows and columns — unnest","text":"data data frame. cols List-columns unnest. selecting multiple columns, values row recycled common size. ... : previously write df %>% unnest(x, y, z). Convert df %>% unnest(c(x, y, z)). previously created new variable unnest() now need explicitly mutate(). Convert df %>% unnest(y = fun(x, y, z)) df %>% mutate(y = fun(x, y, z)) %>% unnest(y). keep_empty default, get one row output element list unchopping/unnesting. means size-0 element (like NULL empty data frame vector), entire row dropped output. want preserve rows, use keep_empty = TRUE replace size-0 elements single row missing values. ptype Optionally, named list column name-prototype pairs coerce cols , overriding default guessed combining individual values. Alternatively, single empty ptype can supplied, applied cols. names_sep NULL, default, outer names come inner names. string, outer names formed pasting together outer inner column names, separated names_sep. names_repair Used check output data frame valid names. Must one following options: \"minimal\": name repair checks, beyond basic existence, \"unique\": make sure names unique empty, \"check_unique\": (default), name repair, check unique, \"universal\": make names unique syntactic function: apply custom name repair. tidyr_legacy: use name repair tidyr 0.8. formula: purrr-style anonymous function (see rlang::as_function()) See vctrs::vec_as_names() details terms strategies used enforce . .drop, .preserve : list-columns now preserved; want output use select() remove prior unnesting. .id : convert df %>% unnest(x, .id = \"id\") df %>% mutate(id = names(x)) %>% unnest(x)). .sep : use names_sep instead.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest.html","id":"new-syntax","dir":"Reference","previous_headings":"","what":"New syntax","title":"Unnest a list-column of data frames into rows and columns — unnest","text":"tidyr 1.0.0 introduced new syntax nest() unnest() designed similar functions. Converting new syntax straightforward (guided message receive) just need run old analysis, can easily revert previous behaviour using nest_legacy() unnest_legacy() follows:","code":"library(tidyr) nest <- nest_legacy unnest <- unnest_legacy"},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Unnest a list-column of data frames into rows and columns — unnest","text":"","code":"# unnest() is designed to work with lists of data frames df <- tibble( x = 1:3, y = list( NULL, tibble(a = 1, b = 2), tibble(a = 1:3, b = 3:1, c = 4) ) ) # unnest() recycles input rows for each row of the list-column # and adds a column for each column df %>% unnest(y) #> # A tibble: 4 × 4 #> x a b c #> #> 1 2 1 2 NA #> 2 3 1 3 4 #> 3 3 2 2 4 #> 4 3 3 1 4 # input rows with 0 rows in the list-column will usually disappear, # but you can keep them (generating NAs) with keep_empty = TRUE: df %>% unnest(y, keep_empty = TRUE) #> # A tibble: 5 × 4 #> x a b c #> #> 1 1 NA NA NA #> 2 2 1 2 NA #> 3 3 1 3 4 #> 4 3 2 2 4 #> 5 3 3 1 4 # Multiple columns ---------------------------------------------------------- # You can unnest multiple columns simultaneously df <- tibble( x = 1:2, y = list( tibble(a = 1, b = 2), tibble(a = 3:4, b = 5:6) ), z = list( tibble(c = 1, d = 2), tibble(c = 3:4, d = 5:6) ) ) df %>% unnest(c(y, z)) #> # A tibble: 3 × 5 #> x a b c d #> #> 1 1 1 2 1 2 #> 2 2 3 5 3 5 #> 3 2 4 6 4 6 # Compare with unnesting one column at a time, which generates # the Cartesian product df %>% unnest(y) %>% unnest(z) #> # A tibble: 5 × 5 #> x a b c d #> #> 1 1 1 2 1 2 #> 2 2 3 5 3 5 #> 3 2 3 5 4 6 #> 4 2 4 6 3 5 #> 5 2 4 6 4 6"},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_auto.html","id":null,"dir":"Reference","previous_headings":"","what":"Automatically call unnest_wider() or unnest_longer() — unnest_auto","title":"Automatically call unnest_wider() or unnest_longer() — unnest_auto","text":"unnest_auto() picks unnest_wider() unnest_longer() inspecting inner names list-col: elements unnamed, uses unnest_longer(indices_include = FALSE). elements named, least one name common across components, uses unnest_wider(). Otherwise, falls back unnest_longer(indices_include = TRUE). handy rapid interactive exploration recommend using scripts, succeed even underlying data radically changes.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_auto.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Automatically call unnest_wider() or unnest_longer() — unnest_auto","text":"","code":"unnest_auto(data, col)"},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_auto.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Automatically call unnest_wider() or unnest_longer() — unnest_auto","text":"data data frame. col List-column unnest.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_longer.html","id":null,"dir":"Reference","previous_headings":"","what":"Unnest a list-column into rows — unnest_longer","title":"Unnest a list-column into rows — unnest_longer","text":"unnest_longer() turns element list-column row. naturally suited list-columns elements unnamed length element varies row row. unnest_longer() generally preserves number columns x modifying number rows. Learn vignette(\"rectangle\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_longer.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Unnest a list-column into rows — unnest_longer","text":"","code":"unnest_longer( data, col, values_to = NULL, indices_to = NULL, indices_include = NULL, keep_empty = FALSE, names_repair = \"check_unique\", simplify = TRUE, ptype = NULL, transform = NULL )"},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_longer.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Unnest a list-column into rows — unnest_longer","text":"data data frame. col List-column(s) unnest. selecting multiple columns, values row recycled common size. values_to string giving column name (names) store unnested values . multiple columns specified col, can also glue string containing \"{col}\" provide template column names. default, NULL, gives output columns names input columns. indices_to string giving column name (names) store inner names positions (named) values. multiple columns specified col, can also glue string containing \"{col}\" provide template column names. default, NULL, gives output columns names values_to, suffixed \"_id\". indices_include single logical value specifying whether add index column. value inner names, index column character vector names, otherwise integer vector positions. NULL, defaults TRUE value inner names indices_to provided. indices_to provided, indices_include FALSE. keep_empty default, get one row output element list unchopping/unnesting. means size-0 element (like NULL empty data frame vector), entire row dropped output. want preserve rows, use keep_empty = TRUE replace size-0 elements single row missing values. names_repair Used check output data frame valid names. Must one following options: \"minimal\": name repair checks, beyond basic existence, \"unique\": make sure names unique empty, \"check_unique\": (default), name repair, check unique, \"universal\": make names unique syntactic function: apply custom name repair. tidyr_legacy: use name repair tidyr 0.8. formula: purrr-style anonymous function (see rlang::as_function()) See vctrs::vec_as_names() details terms strategies used enforce . simplify TRUE, attempt simplify lists length-1 vectors atomic vector. Can also named list containing TRUE FALSE declaring whether attempt simplify particular column. named list provided, default unspecified columns TRUE. ptype Optionally, named list prototypes declaring desired output type component. Alternatively, single empty prototype can supplied, applied components. Use argument want check element type expect simplifying. ptype specified, simplify = FALSE simplification possible, list-column returned element type ptype. transform Optionally, named list transformation functions applied component. Alternatively, single function can supplied, applied components. Use argument want transform parse individual elements extracted. ptype transform supplied, transform applied ptype.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_longer.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Unnest a list-column into rows — unnest_longer","text":"","code":"# `unnest_longer()` is useful when each component of the list should # form a row df <- tibble( x = 1:4, y = list(NULL, 1:3, 4:5, integer()) ) df %>% unnest_longer(y) #> # A tibble: 5 × 2 #> x y #> #> 1 2 1 #> 2 2 2 #> 3 2 3 #> 4 3 4 #> 5 3 5 # Note that empty values like `NULL` and `integer()` are dropped by # default. If you'd like to keep them, set `keep_empty = TRUE`. df %>% unnest_longer(y, keep_empty = TRUE) #> # A tibble: 7 × 2 #> x y #> #> 1 1 NA #> 2 2 1 #> 3 2 2 #> 4 2 3 #> 5 3 4 #> 6 3 5 #> 7 4 NA # If the inner vectors are named, the names are copied to an `_id` column df <- tibble( x = 1:2, y = list(c(a = 1, b = 2), c(a = 10, b = 11, c = 12)) ) df %>% unnest_longer(y) #> # A tibble: 5 × 3 #> x y y_id #> #> 1 1 1 a #> 2 1 2 b #> 3 2 10 a #> 4 2 11 b #> 5 2 12 c # Multiple columns ---------------------------------------------------------- # If columns are aligned, you can unnest simultaneously df <- tibble( x = 1:2, y = list(1:2, 3:4), z = list(5:6, 7:8) ) df %>% unnest_longer(c(y, z)) #> # A tibble: 4 × 3 #> x y z #> #> 1 1 1 5 #> 2 1 2 6 #> 3 2 3 7 #> 4 2 4 8 # This is important because sequential unnesting would generate the # Cartesian product of the rows df %>% unnest_longer(y) %>% unnest_longer(z) #> # A tibble: 8 × 3 #> x y z #> #> 1 1 1 5 #> 2 1 1 6 #> 3 1 2 5 #> 4 1 2 6 #> 5 2 3 7 #> 6 2 3 8 #> 7 2 4 7 #> 8 2 4 8"},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_wider.html","id":null,"dir":"Reference","previous_headings":"","what":"Unnest a list-column into columns — unnest_wider","title":"Unnest a list-column into columns — unnest_wider","text":"unnest_wider() turns element list-column column. naturally suited list-columns every element named, names consistent row--row. unnest_wider() preserves rows x modifying columns. Learn vignette(\"rectangle\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_wider.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Unnest a list-column into columns — unnest_wider","text":"","code":"unnest_wider( data, col, names_sep = NULL, simplify = TRUE, strict = FALSE, names_repair = \"check_unique\", ptype = NULL, transform = NULL )"},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_wider.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Unnest a list-column into columns — unnest_wider","text":"data data frame. col List-column(s) unnest. selecting multiple columns, values row recycled common size. names_sep NULL, default, names left . string, outer inner names pasted together using names_sep separator. values unnested unnamed, names_sep must supplied, otherwise error thrown. names_sep supplied, names automatically generated unnamed values increasing sequence integers. simplify TRUE, attempt simplify lists length-1 vectors atomic vector. Can also named list containing TRUE FALSE declaring whether attempt simplify particular column. named list provided, default unspecified columns TRUE. strict single logical specifying whether apply strict vctrs typing rules. FALSE, typed empty values (like list() integer()) nested within list-columns treated like NULL contribute type unnested column. useful working JSON, empty values tend lose type information show list(). names_repair Used check output data frame valid names. Must one following options: \"minimal\": name repair checks, beyond basic existence, \"unique\": make sure names unique empty, \"check_unique\": (default), name repair, check unique, \"universal\": make names unique syntactic function: apply custom name repair. tidyr_legacy: use name repair tidyr 0.8. formula: purrr-style anonymous function (see rlang::as_function()) See vctrs::vec_as_names() details terms strategies used enforce . ptype Optionally, named list prototypes declaring desired output type component. Alternatively, single empty prototype can supplied, applied components. Use argument want check element type expect simplifying. ptype specified, simplify = FALSE simplification possible, list-column returned element type ptype. transform Optionally, named list transformation functions applied component. Alternatively, single function can supplied, applied components. Use argument want transform parse individual elements extracted. ptype transform supplied, transform applied ptype.","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/unnest_wider.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Unnest a list-column into columns — unnest_wider","text":"","code":"df <- tibble( character = c(\"Toothless\", \"Dory\"), metadata = list( list( species = \"dragon\", color = \"black\", films = c( \"How to Train Your Dragon\", \"How to Train Your Dragon 2\", \"How to Train Your Dragon: The Hidden World\" ) ), list( species = \"blue tang\", color = \"blue\", films = c(\"Finding Nemo\", \"Finding Dory\") ) ) ) df #> # A tibble: 2 × 2 #> character metadata #> #> 1 Toothless #> 2 Dory # Turn all components of metadata into columns df %>% unnest_wider(metadata) #> # A tibble: 2 × 4 #> character species color films #> #> 1 Toothless dragon black #> 2 Dory blue tang blue # Choose not to simplify list-cols of length-1 elements df %>% unnest_wider(metadata, simplify = FALSE) #> # A tibble: 2 × 4 #> character species color films #> #> 1 Toothless #> 2 Dory df %>% unnest_wider(metadata, simplify = list(color = FALSE)) #> # A tibble: 2 × 4 #> character species color films #> #> 1 Toothless dragon #> 2 Dory blue tang # You can also widen unnamed list-cols: df <- tibble( x = 1:3, y = list(NULL, 1:3, 4:5) ) # but you must supply `names_sep` to do so, which generates automatic names: df %>% unnest_wider(y, names_sep = \"_\") #> # A tibble: 3 × 4 #> x y_1 y_2 y_3 #> #> 1 1 NA NA NA #> 2 2 1 2 3 #> 3 3 4 5 NA # 0-length elements --------------------------------------------------------- # The defaults of `unnest_wider()` treat empty types (like `list()`) as `NULL`. json <- list( list(x = 1:2, y = 1:2), list(x = list(), y = 3:4), list(x = 3L, y = list()) ) df <- tibble(json = json) df %>% unnest_wider(json) #> # A tibble: 3 × 2 #> x y #> #> 1 #> 2 #> 3 # To instead enforce strict vctrs typing rules, use `strict` df %>% unnest_wider(json, strict = TRUE) #> # A tibble: 3 × 2 #> x y #> #> 1 #> 2 #> 3 "},{"path":"https://tidyr.tidyverse.org/dev/reference/us_rent_income.html","id":null,"dir":"Reference","previous_headings":"","what":"US rent and income data — us_rent_income","title":"US rent and income data — us_rent_income","text":"Captured 2017 American Community Survey using tidycensus package.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/us_rent_income.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"US rent and income data — us_rent_income","text":"","code":"us_rent_income"},{"path":"https://tidyr.tidyverse.org/dev/reference/us_rent_income.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"US rent and income data — us_rent_income","text":"dataset variables: GEOID FIP state identifier NAME Name state variable Variable name: income = median yearly income, rent = median monthly rent estimate Estimated value moe 90% margin error","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/who.html","id":null,"dir":"Reference","previous_headings":"","what":"World Health Organization TB data — who","title":"World Health Organization TB data — who","text":"subset data World Health Organization Global Tuberculosis Report, accompanying global populations. uses original codes World Health Organization. column names columns 5 60 made combining new_ : method diagnosis (rel = relapse, sn = negative pulmonary smear, sp = positive pulmonary smear, ep = extrapulmonary), gender (f = female, m = male), age group (014 = 0-14 yrs age, 1524 = 15-24, 2534 = 25-34, 3544 = 35-44 years age, 4554 = 45-54, 5564 = 55-64, 65 = 65 years older). who2 lightly modified version makes teaching basics easier tweaking variables slightly consistent dropping iso2 iso3. newrel replaced new_rel, _ added gender.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/who.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"World Health Organization TB data — who","text":"","code":"who who2 population"},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/reference/who.html","id":"who","dir":"Reference","previous_headings":"","what":"who","title":"World Health Organization TB data — who","text":"data frame 7,240 rows 60 columns: country Country name iso2, iso3 2 & 3 letter ISO country codes year Year new_sp_m014 - new_rel_f65 Counts new TB cases recorded group. Column names encode three variables describe group.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/who.html","id":"who-","dir":"Reference","previous_headings":"","what":"who2","title":"World Health Organization TB data — who","text":"data frame 7,240 rows 58 columns.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/who.html","id":"population","dir":"Reference","previous_headings":"","what":"population","title":"World Health Organization TB data — who","text":"data frame 4,060 rows three columns: country Country name year Year population Population","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/who.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"World Health Organization TB data — who","text":"https://www..int/teams/global-tuberculosis-programme/data","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/world_bank_pop.html","id":null,"dir":"Reference","previous_headings":"","what":"Population data from the World Bank — world_bank_pop","title":"Population data from the World Bank — world_bank_pop","text":"Data population World Bank.","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/world_bank_pop.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Population data from the World Bank — world_bank_pop","text":"","code":"world_bank_pop"},{"path":"https://tidyr.tidyverse.org/dev/reference/world_bank_pop.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Population data from the World Bank — world_bank_pop","text":"dataset variables: country Three letter country code indicator Indicator name: SP.POP.GROW = population growth, SP.POP.TOTL = total population, SP.URB.GROW = urban population growth, SP.URB.TOTL = total urban population 2000-2018 Value year","code":""},{"path":"https://tidyr.tidyverse.org/dev/reference/world_bank_pop.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Population data from the World Bank — world_bank_pop","text":"Dataset World Bank data bank: https://data.worldbank.org","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-development-version","dir":"Changelog","previous_headings":"","what":"tidyr (development version)","title":"tidyr (development version)","text":"pivot_wider_spec() now throws informative error non-data frame inputs (@catalamarti, #1510). tidyr now requires dplyr >=1.1.0 (#1568, @catalamarti).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-131","dir":"Changelog","previous_headings":"","what":"tidyr 1.3.1","title":"tidyr 1.3.1","text":"CRAN release: 2024-01-24 pivot_wider now uses .|> syntax dplyr helper message identify duplicates (@boshek, #1516)","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-130","dir":"Changelog","previous_headings":"","what":"tidyr 1.3.0","title":"tidyr 1.3.0","text":"CRAN release: 2023-01-24","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"new-features-1-3-0","dir":"Changelog","previous_headings":"","what":"New features","title":"tidyr 1.3.0","text":"New family consistent string separating functions: separate_wider_delim(), separate_wider_position(), separate_wider_regex(), separate_longer_delim(), separate_longer_position(). functions thorough refreshes separate() extract(), featuring improved performance, greater consistency, polished API, new approach handling problems. use stringr supersede extract(), separate(), separate_rows() (#1304). named character vector interface used separate_wider_regex() similar nc package Toby Dylan Hocking. nest() gains .argument allows specify columns nest (rather columns nest, .e. ...). Additionally, .key argument longer deprecated, used whenever ... isn’t specified (#1458). unnest_longer() gains keep_empty argument like unnest() (#1339). pivot_longer() gains cols_vary argument controlling ordering output rows relative original row number (#1312). New datasets who2, household, cms_patient_experience, cms_patient_care demonstrate various tidying challenges (#1333).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"breaking-changes-1-3-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"tidyr 1.3.0","text":"... argument pivot_longer() pivot_wider() moved front function signature, required arguments optional ones. Additionally, pivot_longer_spec(), pivot_wider_spec(), build_longer_spec(), build_wider_spec() gained ... arguments similar location. change allows us easily add new features pivoting functions without breaking existing CRAN packages user scripts. pivot_wider() provides temporary backwards compatible support case single unnamed argument previously positionally matched id_cols. one special case still works, throw warning encouraging explicitly name id_cols argument. read pattern, see https://design.tidyverse.org/dots--required.html (#1350).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"lifecycle-changes-1-3-0","dir":"Changelog","previous_headings":"","what":"Lifecycle changes","title":"tidyr 1.3.0","text":"functions deprecated tidyr 1.0 1.2 (old lazyeval functions ending _ various arguments unnest()) now warn every use. made defunct 2024 (#1406).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"rectangling-1-3-0","dir":"Changelog","previous_headings":"","what":"Rectangling","title":"tidyr 1.3.0","text":"unnest_longer() now consistently drops rows either NULL empty vectors (like integer()) default. Set new keep_empty argument TRUE retain . Previously, keep_empty = TRUE implicitly used NULL, keep_empty = FALSE used empty vectors, inconsistent tidyr verbs argument (#1363). unnest_longer() now uses \"\" index column fully unnamed vectors. also now consistently uses NA index column empty vectors “kept” keep_empty = TRUE (#1442). unnest_wider() now errors values unnested unnamed names_sep provided (#1367). unnest_wider() now generates automatic names partially unnamed vectors. Previously generated fully unnamed vectors, resulting strange mix automatic names name-repaired names (#1367).","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"general-1-3-0","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"General","title":"tidyr 1.3.0","text":"tidyr functions now consistently disallow renaming tidy-selection. Renaming never meaningful functions, previously either effect caused problems (#1449, #1104). tidyr errors (including input validation) thoroughly reviewed generally likely point right direction (#1313, #1400). uncount() now generic implementations can provided objects data frames (@mgirlich, #1358). uncount() gains ... argument. comes required optional arguments (@mgirlich, #1358). nest(), complete(), expand(), fill() now document support grouped data frames created dplyr::group_by() (#952). built datasets now standard tibbles (#1459). R >=3.4.0 now required, line tidyverse standard supporting previous 5 minor releases R. rlang >=1.0.4 vctrs >=0.5.2 now required (#1344, #1470). Removed dependency ellipsis favor equivalent functions rlang (#1314).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"nesting-packing-and-chopping-1-3-0","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Nesting, packing, and chopping","title":"tidyr 1.3.0","text":"unnest(), unchop(), unnest_longer(), unnest_wider() better handle lists additional classes (#1327). pack(), unpack(), chop(), unchop() gain error_call argument, turn improves error calls shown nest() various unnest() adjacent functions (#1446). chop(), unpack(), unchop() gain ..., must empty (#1447). unpack() better job reporting column name duplication issues gives better advice resolve using names_sep. also improves errors functions use unpack(), like unnest() unnest_wider() (#1425, #1367).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"pivoting-1-3-0","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Pivoting","title":"tidyr 1.3.0","text":"pivot_longer() longer supports interpreting values_ptypes = list() names_ptypes = list() NULL. empty list() now interpreted prototype apply columns, consistent 0-length value interpreted (#1296). pivot_longer(values_drop_na = TRUE) faster aren’t missing values drop (#1392, @mgirlich). pivot_longer() now memory efficient due usage vctrs::vec_interleave() (#1310, @mgirlich). pivot_longer() now throws slightly better error message values_ptypes names_ptypes provided coercion can’t made (#1364). pivot_wider() now throws better error message column selected names_from values_from also selected id_cols (#1318). pivot_wider() now faster names_sep provided (@mgirlich, #1426). pivot_longer_spec(), pivot_wider_spec(), build_longer_spec(), build_wider_spec() gain error_call argument, resulting better error reporting pivot_longer() pivot_wider() (#1408).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"missing-values-1-3-0","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Missing values","title":"tidyr 1.3.0","text":"fill() now works correctly column named .direction data (#1319, @tjmahr). replace_na() faster aren’t missing values replace (#1392, @mgirlich). documentation replace argument replace_na() now mentions replace always cast type data (#1317).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-121","dir":"Changelog","previous_headings":"","what":"tidyr 1.2.1","title":"tidyr 1.2.1","text":"CRAN release: 2022-09-08 Hot patch release resolve R CMD check failures.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-120","dir":"Changelog","previous_headings":"","what":"tidyr 1.2.0","title":"tidyr 1.2.0","text":"CRAN release: 2022-02-01","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"breaking-changes-1-2-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"tidyr 1.2.0","text":"complete() expand() longer allow complete expand grouping column. never well-defined since completion/expansion grouped data frame happens “within” group otherwise potential produce erroneous results (#1299). replace_na() longer allows type data change replacement applied. replace now always cast type data replacement made. example, means using replacement value 1.5 integer column longer allowed. Similarly, replacing missing values list-column must now done list(\"foo\") rather just \"foo\".","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"pivoting-1-2-0","dir":"Changelog","previous_headings":"","what":"Pivoting","title":"tidyr 1.2.0","text":"pivot_wider() gains new names_expand id_expand arguments turning implicit missing factor levels variable combinations explicit ones. similar drop argument spread() (#770). pivot_wider() gains new names_vary argument controlling ordering combining names_from values values_from column names (#839). pivot_wider() gains new unused_fn argument controlling summarize unused columns aren’t involved pivoting process (#990, thanks @mgirlich initial implementation). pivot_longer()’s names_transform values_transform arguments now accept single function applied columns (#1284, thanks @smingerson initial implementation). pivot_longer()’s names_ptypes values_ptypes arguments now accept single empty ptype applied columns (#1284).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"nesting-1-2-0","dir":"Changelog","previous_headings":"","what":"Nesting","title":"tidyr 1.2.0","text":"unnest() unchop()’s ptype argument now accepts single empty ptype applied cols (#1284). unpack() now silently skips non-data frame columns specified cols. matches existing behavior unchop() unnest() (#1153).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"rectangling-1-2-0","dir":"Changelog","previous_headings":"","what":"Rectangling","title":"tidyr 1.2.0","text":"unnest_wider() unnest_longer() can now unnest multiple columns (#740). unnest_longer()’s indices_to values_to arguments now accept glue specification, useful unnesting multiple columns. hoist(), unnest_longer(), unnest_wider(), ptype supplied, column can’t simplified, result list-column element type ptype (#998). unnest_wider() gains new strict argument controls whether strict vctrs typing rules applied. defaults FALSE backwards compatibility, often useful lax unnesting JSON, doesn’t always map one--one R’s types (#1125). hoist(), unnest_longer(), unnest_wider()’s simplify argument now accepts named list TRUE FALSE control simplification per column basis (#995). hoist(), unnest_longer(), unnest_wider()’s transform argument now accepts single function applied components (#1284). hoist(), unnest_longer(), unnest_wider()’s ptype argument now accepts single empty ptype applied components (#1284).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"grids-1-2-0","dir":"Changelog","previous_headings":"","what":"Grids","title":"tidyr 1.2.0","text":"complete() gains new explicit argument limiting fill implicit missing values. useful don’t want fill pre-existing missing values (#1270). complete() gains grouped data frame method. generates correct completed data frame groups involved (#396, #966).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"missing-values-1-2-0","dir":"Changelog","previous_headings":"","what":"Missing values","title":"tidyr 1.2.0","text":"drop_na(), replace_na(), fill() updated utilize vctrs. means can use functions wider variety column types, including lubridate’s Period types (#1094), data frame columns, rcrd type vctrs. replace_na() longer replaces empty atomic elements list-columns (like integer(0)). value replaced list-column NULL (#1168). drop_na() longer drops empty atomic elements list-columns (like integer(0)). value dropped list-column NULL (#1228).","code":""},{"path":[]},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"general-1-2-0","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"General","title":"tidyr 1.2.0","text":"@mgirlich now tidyr author recognition significant sustained contributions. lazyeval variants tidyr verbs soft-deprecated. Expect move defunct stage next minor release tidyr (#1294). any_of() all_of() tidyselect now re-exported (#1217). dplyr >= 1.0.0 now required.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"pivoting-1-2-0-1","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Pivoting","title":"tidyr 1.2.0","text":"pivot_wider() now gives better advice identify duplicates values uniquely identified (#1113). pivot_wider() now throws informative error values_fn doesn’t result single summary value (#1238). pivot_wider() pivot_longer() now generate informative errors related name repair (#987). pivot_wider() now works correctly values_fill data frame. pivot_wider() longer accidentally retains values_from pivoting zero row data frame (#1249). pivot_wider() now correctly handles case id column name collides value names_from (#1107). pivot_wider() pivot_longer() now check spec columns .name .value character vectors. Additionally, .name column must unique (#1107). pivot_wider()’s names_from values_from arguments now required default values name value don’t correspond columns data. Additionally, must identify least 1 column data (#1240). pivot_wider()’s values_fn argument now correctly allows anonymous functions (#1114). pivot_wider_spec() now works correctly 0-row data frame spec doesn’t identify rows (#1250, #1252). pivot_longer()’s names_ptypes argument now applied names_transform consistency rectangling functions (.e. hoist()) (#1233). check_pivot_spec() new developer facing function validating pivot spec argument. useful extending pivot_longer() pivot_wider() new S3 methods (#1087).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"nesting-1-2-0-1","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Nesting","title":"tidyr 1.2.0","text":"nest() generic now avoids computing .data, making compatible lazy tibbles (#1134). .names_sep argument data.frame method nest() now actually used (#1174). unnest()’s ptype argument now works expected (#1158). unpack() longer drops empty columns specified cols (#1191). unpack() now works correctly data frame columns containing 1 row 0 columns (#1189). chop() now works correctly data frames 0 rows (#1206). chop()’s cols argument longer optional. matches behavior cols seen elsewhere tidyr (#1205). unchop() now respects ptype unnesting non-list column (#1211).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"rectangling-1-2-0-1","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Rectangling","title":"tidyr 1.2.0","text":"hoist() longer accidentally removes elements duplicated names (#1259).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"grids-1-2-0-1","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Grids","title":"tidyr 1.2.0","text":"grouped data frame methods complete() expand() now move group columns front result (addition columns completed expanded, already moved front). make intuitive sense, completing expanding “within” group, group columns first thing see (#1289). complete() now applies fill even columns complete specified (#1272). expand(), crossing(), nesting() now correctly retain NA values factors (#1275). expand_grid(), expand(), nesting(), crossing() now silently apply name repair automatically named inputs. avoids number issues resulting duplicate truncated names (#1116, #1221, #1092, #1037, #992). expand_grid(), expand(), nesting(), crossing() now allow columns unnamed data frames used expressions data frame specified, like expand_grid(tibble(x = 1), y = x). consistent tibble() behaves. expand_grid(), expand(), nesting(), crossing() now work correctly data frames containing 0 columns >0 rows (#1189). expand_grid(), expand(), nesting(), crossing() now return 1 row data frame inputs supplied, consistent prod() == 1L idea computations involving number combinations computed empty set return 1 (#1258).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"missing-values-1-2-0-1","dir":"Changelog","previous_headings":"Bug fixes and minor improvements","what":"Missing values","title":"tidyr 1.2.0","text":"drop_na() longer drops missing values columns tidyselect expression results 0 columns selected used (#1227). fill() now treats NaN like missing value (#982).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-114","dir":"Changelog","previous_headings":"","what":"tidyr 1.1.4","title":"tidyr 1.1.4","text":"CRAN release: 2021-09-27 expand_grid() now twice fast pivot_wider() bit faster (@mgirlich, #1130). unchop() now much faster, propagates various functions, unnest(), unnest_longer(), unnest_wider(), separate_rows() (@mgirlich, @DavisVaughan, #1127). unnest() now much faster (@mgirlich, @DavisVaughan, #1127). unnest() longer allows unnesting list-col containing mix vector data frame elements. Previously, worked accident, considered -label usage unnest() now become error.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-113","dir":"Changelog","previous_headings":"","what":"tidyr 1.1.3","title":"tidyr 1.1.3","text":"CRAN release: 2021-03-03 tidyr verbs longer “default” methods lazyeval fallbacks. means ’ll get clearer error messages (#1036). uncount() error non-integer weights gives clearer error message negative weights (@mgirlich, #1069). can unnest dates (#1021, #1089). pivot_wider() works data.table empty key variables (@mgirlich, #1066). separate_rows() works factor columns (@mgirlich, #1058).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-112","dir":"Changelog","previous_headings":"","what":"tidyr 1.1.2","title":"tidyr 1.1.2","text":"CRAN release: 2020-08-27 separate_rows() returns 1.1.0 behaviour empty strings (@rjpatm, #1014).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-111","dir":"Changelog","previous_headings":"","what":"tidyr 1.1.1","title":"tidyr 1.1.1","text":"CRAN release: 2020-07-31 New tidyr logo! stringi dependency removed; substantial dependency make tidyr hard compile resource constrained environments (@rjpat, #936). Replace Rcpp cpp11. See https://cpp11.r-lib.org/articles/motivations.html reasons .","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-110","dir":"Changelog","previous_headings":"","what":"tidyr 1.1.0","title":"tidyr 1.1.0","text":"CRAN release: 2020-05-20","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"general-features-1-1-0","dir":"Changelog","previous_headings":"","what":"General features","title":"tidyr 1.1.0","text":"pivot_longer(), hoist(), unnest_wider(), unnest_longer() gain new transform arguments; allow transform values “flight”. partly needed vctrs coercion rules become stricter, give greater flexibility available previously (#921). Arguments use tidy selection syntax now clearly documented updated use tidyselect 1.1.0 (#872).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"pivoting-improvements-1-1-0","dir":"Changelog","previous_headings":"","what":"Pivoting improvements","title":"tidyr 1.1.0","text":"pivot_wider() pivot_longer() considerably performant, thanks largely improvements underlying vctrs code (#790, @DavisVaughan). pivot_longer() now supports names_to = character() prevents name column created (#961). {r} df <- tibble(id = 1:3, x_1 = 1:3, x_2 = 4:6) df %>% pivot_longer(-id, names_to = character()) pivot_longer() longer creates .copy variable presence duplicate column names. makes consistent handling non-unique specs. pivot_longer() automatically disambiguates non-unique ouputs, can occur input variables include additional component don’t care want discard (#792, #793). {r} df <- tibble(id = 1:3, x_1 = 1:3, x_2 = 4:6) df %>% pivot_longer(-id, names_pattern = \"(.)_.\") df %>% pivot_longer(-id, names_sep = \"_\", names_to = c(\"name\", NA)) df %>% pivot_longer(-id, names_sep = \"_\", names_to = c(\".value\", NA)) pivot_wider() gains names_sort argument allows sort column names order. default, FALSE, orders columns first appearance (#839). future version, ’ll consider changing default TRUE. pivot_wider() gains names_glue argument allows construct output column names glue specification. pivot_wider() arguments values_fn values_fill can now single values; now need use named list want use different values different value columns (#739, #746). also get improved errors ’re expected type.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"rectangling-1-1-0","dir":"Changelog","previous_headings":"","what":"Rectangling","title":"tidyr 1.1.0","text":"hoist() now automatically names pluckers single string (#837). error use duplicated column names (@mgirlich, #834), now uses rlang::list2() behind scenes (means can now use !!! :=) (#801). unnest_longer(), unnest_wider(), hoist() better job simplifying list-cols. longer add unneeded unspecified() result still list (#806), work list contains non-vectors (#810, #848). unnest_wider(names_sep = \"\") now provides default names unnamed inputs, suppressing many previous name repair messages (#742).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"nesting-1-1-0","dir":"Changelog","previous_headings":"","what":"Nesting","title":"tidyr 1.1.0","text":"pack() nest() gains .names_sep argument allows strip outer names inner names, symmetrical way argument unpack() unnest() combines inner outer names (#795, #797). unnest_wider() unnest_longer() can now unnest list_of columns. important unnesting columns created nest() pivot_wider(), create list_of columns id columns non-unique (#741).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"bug-fixes-and-minor-improvements-1-1-0","dir":"Changelog","previous_headings":"","what":"Bug fixes and minor improvements","title":"tidyr 1.1.0","text":"chop() now creates list-columns class vctrs::list_of(). helps keep track type case chopped data frame empty, allowing unchop() reconstitute data frame correct number types column even observations. drop_na() now preserves attributes unclassed vectors (#905). expand(), expand_grid(), crossing(), nesting() evaluate inputs iteratively, can refer freshly created columns, e.g. crossing(x = seq(-2, 2), y = x) (#820). expand(), expand_grid(), crossing(), nesting() gain .name_repair giving control name repair strategy (@jeffreypullin, #798). extract() lets use NA , documented (#793). extract(), separate(), hoist(), unnest_longer(), unnest_wider() give better error message col missing (#805). pack()’s first argument now .data instead data (#759). pivot_longer() now errors values_to length-1 character vector (#949). pivot_longer() pivot_wider() now generic implementations can provided objects data frames (#800). pivot_wider() can now pivot data frame columns (#926) unite(na.rm = TRUE) now works types variable, just character vectors (#765). unnest_wider() gives better error message attempt unnest multiple columns (#740). unnest_auto() works input data contains column called col (#959).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-102","dir":"Changelog","previous_headings":"","what":"tidyr 1.0.2","title":"tidyr 1.0.2","text":"CRAN release: 2020-01-24 Minor fixes dev versions rlang, tidyselect, tibble.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-101","dir":"Changelog","previous_headings":"","what":"tidyr 1.0.1","title":"tidyr 1.0.1","text":"exist since accidentally released v1.0.2","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-100","dir":"Changelog","previous_headings":"","what":"tidyr 1.0.0","title":"tidyr 1.0.0","text":"CRAN release: 2019-09-11","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"breaking-changes-1-0-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"tidyr 1.0.0","text":"See vignette(\"-packages\") detailed transition guide. nest() unnest() new syntax. majority existing usage automatically translated new syntax warning. doesn’t work, put script use old versions can take closer look update code: nest() now preserves grouping, implications downstream calls group-aware functions, dplyr::mutate() filter(). first argument nest() changed data .data. unnest() uses emerging tidyverse standard disambiguate unique names. Use names_repair = tidyr_legacy request previous approach. unnest_()/nest_() lazyeval methods unnest()/nest() now defunct. deprecated time, , since interface changed, package authors need update avoid deprecation warnings. think one clean break less work everyone. lazyeval functions formally deprecated, made defunct next major release. (See lifecycle vignette details deprecation stages). crossing() nesting() now return 0-row outputs input length-0 vector. want preserve previous behaviour silently dropped inputs, convert empty vectors NULL. (discussion general pattern https://github.com/tidyverse/principles/issues/24)","code":"library(tidyr) nest <- nest_legacy unnest <- unnest_legacy"},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"pivoting-1-0-0","dir":"Changelog","previous_headings":"","what":"Pivoting","title":"tidyr 1.0.0","text":"New pivot_longer() pivot_wider() provide modern alternatives spread() gather(). carefully redesigned easier learn remember, include many new features. Learn vignette(\"pivot\"). functions resolve multiple existing issues spread()/gather(). functions now handle mulitple value columns (#149/#150), support vector types (#333), use tidyverse conventions duplicated column names (#496, #478), symmetric (#453). pivot_longer() gracefully handles duplicated column names (#472), can directly split column names multiple variables. pivot_wider() can now aggregate (#474), select keys (#572), control generated column names (#208). demonstrate functions work practice, tidyr gained several new datasets: relig_income, construction, billboard, us_rent_income, fish_encounters world_bank_pop. Finally, tidyr demos removed. dated, superseded vignette(\"pivot\").","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"rectangling-1-0-0","dir":"Changelog","previous_headings":"","what":"Rectangling","title":"tidyr 1.0.0","text":"tidyr contains four new functions support rectangling, turning deeply nested list tidy tibble: unnest_longer(), unnest_wider(), unnest_auto(), hoist(). documented new vignette: vignette(\"rectangle\"). unnest_longer() unnest_wider() make easier unnest list-columns vectors either rows columns (#418). unnest_auto() automatically picks _longer() _wider() using heuristics based presence common names. New hoist() provides convenient way plucking components list-column top-level columns (#341). particularly useful working deeply nested JSON, provides convenient shortcut mutate() + map() pattern: {r} df %>% hoist(metadata, name = \"name\") # shortcut df %>% mutate(name = map_chr(metadata, \"name\"))","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"nesting-1-0-0","dir":"Changelog","previous_headings":"","what":"Nesting","title":"tidyr 1.0.0","text":"nest() unnest() updated new interfaces closely aligned evolving tidyverse conventions. use theory developed vctrs consistently handle mixtures input types, arguments overhauled based last years experience. supported new vignette(\"nest\"), outlines main ideas nested data (’s still rough, get better time). biggest change operation multiple columns: df %>% unnest(x, y, z) becomes df %>% unnest(c(x, y, z)) df %>% nest(x, y, z) becomes df %>% nest(data = c(x, y, z)). done best ensure common uses nest() unnest() continue work, generating informative warning telling precisely need update code. Please file issue ’ve missed important use case. unnest() overhauled: New keep_empty parameter ensures every row input gets least one row output, inserting missing values needed (#358). Provides names_sep argument control inner outer column names combined. Uses standard tidyverse name-repair rules, default get error output contain multiple columns name. can override using name_repair (#514). Now supports NULL entries (#436).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"packing-and-chopping-1-0-0","dir":"Changelog","previous_headings":"","what":"Packing and chopping","title":"tidyr 1.0.0","text":"hood, nest() unnest() implemented chop(), pack(), unchop(), unpack(): pack() unpack() allow pack unpack columns data frame columns (#523). chop() unchop() chop rows sets list-columns. Packing chopping interesting primarily atomic operations underlying nesting (similarly, unchop unpacking underlie unnesting), don’t expect used directly often.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"new-features-1-0-0","dir":"Changelog","previous_headings":"","what":"New features","title":"tidyr 1.0.0","text":"New expand_grid(), tidy version expand.grid(), lower-level existing expand() crossing() functions, takes individual vectors, sort uniquify . crossing(), nesting(), expand() rewritten use vctrs package. affect much existing code, considerably simplies implementation ensures functions work consistently across generalised vectors (#557). part alignment, functions now drop NULL inputs, 0-length vector.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"bug-fixes-and-minor-improvements-1-0-0","dir":"Changelog","previous_headings":"","what":"Bug fixes and minor improvements","title":"tidyr 1.0.0","text":"full_seq() now also works gaps observations shorter given period, within tolerance given tol. Previously, gaps consecutive observations range [period, period + tol]; gaps can now range [period - tol, period + tol] (@ha0ye, #657). tidyr now re-exports tibble(), as_tibble(), tribble(), well tidyselect helpers (starts_with(), ends_width(), …). makes generating documentation, reprexes, tests easier, makes tidyr easier use without also attaching dplyr. functions take ... instrumented functions ellipsis package warn ’ve supplied arguments ignored (typically ’ve misspelled argument name) (#573). complete() now uses full_join() levels preserved even levels specified (@Ryo-N7, #493). crossing() now takes unique values data frame inputs, just vector inputs (#490). gather() throws error column data frame (#553). extract() (hence pivot_longer()) can extract multiple input values single output column (#619). fill() now implemented using dplyr::mutate_at(). radically simplifies implementation considerably improves performance working grouped data (#520). fill() now accepts downup updown fill directions (@coolbutuseless, #505). unite() gains na.rm argument, making easier remove missing values prior uniting values together (#203)","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-083","dir":"Changelog","previous_headings":"","what":"tidyr 0.8.3","title":"tidyr 0.8.3","text":"CRAN release: 2019-03-01 crossing() preserves factor levels (#410), now works list-columns (#446, @SamanthaToet). (also help expand() built top crossing()) nest() compatible dplyr 0.8.0. spread() works id variable names (#525). unnest() preserves column unnested input zero-length (#483), using list_of() attribute correctly restore columns, possible. unnest() run named unnamed list-columns length (@hlendway, #460).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-082","dir":"Changelog","previous_headings":"","what":"tidyr 0.8.2","title":"tidyr 0.8.2","text":"CRAN release: 2018-10-28 separate() now accepts NA column name argument denote columns omitted result. (@markdly, #397). Minor updates ensure compatibility dependencies.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-081","dir":"Changelog","previous_headings":"","what":"tidyr 0.8.1","title":"tidyr 0.8.1","text":"CRAN release: 2018-05-18 unnest() weakens test “atomicity” restore previous behaviour unnesting factors dates (#407).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-080","dir":"Changelog","previous_headings":"","what":"tidyr 0.8.0","title":"tidyr 0.8.0","text":"CRAN release: 2018-01-29","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"breaking-changes-0-8-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"tidyr 0.8.0","text":"deliberate breaking changes release. However, number packages failing errors related numbers elements columns, row names. possible accidental API changes new bugs. see error package, sincerely appreciate minimal reprex. separate() now correctly uses -1 refer far right position, instead -2. depended behaviour, ’ll need switch packageVersion(\"tidyr\") > \"0.7.2\"","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"new-features-0-8-0","dir":"Changelog","previous_headings":"","what":"New features","title":"tidyr 0.8.0","text":"Increased test coverage 84% 99%. uncount() performs inverse operation dplyr::count() (#279)","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"bug-fixes-and-minor-improvements-0-8-0","dir":"Changelog","previous_headings":"","what":"Bug fixes and minor improvements","title":"tidyr 0.8.0","text":"complete(data) now returns data rather throwing error (#390). complete() zero-length completions returns original input (#331). crossing() preserves NAs (#364). expand() empty input gives empty data frame instead NULL (#331). expand(), crossing(), complete() now complete empty factors instead dropping (#270, #285) extract() better error message regex contain expected number groups (#313). drop_na() longer drops columns (@jennybryan, #245), works list-cols (#280). Equivalent NA list column empty (length 0) data structure. nest() now faster, especially long data frame collapsed nested data frame rows. nest() zero-row data frame works expected (#320). replace_na() longer complains try replace missing values variables present data (#356). replace_na() now also works vectors (#342, @flying-sheep), can replace NULL list-columns. throws better error message attempt replace something length 1. separate() longer checks ... empty, allowing methods make use . check added tidyr 0.4.0 (2016-02-02) deprecate previous behaviour ... passed strsplit(). separate() extract() now insert columns correct position drop = TRUE (#394). separate() now works correctly counts RHS using negative integer sep values (@markdly, #315). separate() gets improved warning message pieces aren’t expected (#375). separate_rows() supports list columns (#321), works empty tibbles. spread() now consistently returns 0 row outputs 0 row inputs (#269). spread() now works key column includes NA drop FALSE (#254). spread() longer returns tibbles row names (#322). spread(), separate(), extract() (#255), gather() (#347) now replace existing variables rather creating invalid data frame duplicated variable names (matching semantics mutate). unite() now works (documented) don’t supply variables (#355). unnest() gains preserve argument allows preserve list columns without unnesting (#328). unnest() can unnested list-columns contains lists lists (#278). unnest(df) now works df contains list-cols (#344)","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-072","dir":"Changelog","previous_headings":"","what":"tidyr 0.7.2","title":"tidyr 0.7.2","text":"CRAN release: 2017-10-16 SE variants gather_(), spread_() nest_() now treat non-syntactic names way pre tidy eval versions tidyr (#361). Fix tidyr bug revealed R-devel.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-071","dir":"Changelog","previous_headings":"","what":"tidyr 0.7.1","title":"tidyr 0.7.1","text":"CRAN release: 2017-09-01 hotfix release account tidyselect changes unit tests. Note upcoming version tidyselect backtracks changes announced 0.7.0. special evaluation semantics selection changed back old behaviour new rules causing much trouble confusion. now data expressions (symbols calls : c()) can refer registered variables objects context. However semantics context expressions (calls : c()) remain . expressions evaluated context refer registered variables. ’re writing functions refer contextual objects, still good idea avoid data expressions following advice 0.7.0 release notes.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-070","dir":"Changelog","previous_headings":"","what":"tidyr 0.7.0","title":"tidyr 0.7.0","text":"CRAN release: 2017-08-16 release includes important changes tidyr internals. Tidyr now supports new tidy evaluation framework quoting (NSE) functions. also uses new tidyselect package selecting backend.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"breaking-changes-0-7-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"tidyr 0.7.0","text":"see error messages objects functions found, likely selecting functions now stricter arguments example selecting function gather() ... argument. change makes code robust disallowing ambiguous scoping. Consider following code: select first three columns (using x defined global environment), select first two columns (using column named x)? solve ambiguity, now make strict distinction data context expressions. data expression either bare name expression like x:y c(x, y). data expression, can refer columns data frame. Everything else context expression can refer objects defined <-. practice means can longer refer contextual objects like : now explicit find objects. , can use quasiquotation operator !! evaluate argument early inline result: {r} mtcars %>% gather(var, value, !! 1:ncol(mtcars)) mtcars %>% gather(var, value, !! 1:x) mtcars %>% gather(var, value, !! -(1:x)) alternative turn data expression context expression using seq() seq_len() instead :. See section tidyselect information semantics. Following switch tidy evaluation, might see warnings “variable context set”. likely caused supplying helpers like everything() underscored versions tidyr verbs. Helpers always evaluated lazily. fix , just quote helper formula: drop_na(df, ~everything()). selecting functions now stricter supply integer positions. see error along lines please round positions supplying tidyr. Double vectors fine long rounded.","code":"x <- 3 df <- tibble(w = 1, x = 2, y = 3) gather(df, \"variable\", \"value\", 1:x) mtcars %>% gather(var, value, 1:ncol(mtcars)) x <- 3 mtcars %>% gather(var, value, 1:x) mtcars %>% gather(var, value, -(1:x)) `-0.949999999999999`, `-0.940000000000001`, ... must resolve to integer column positions, not a double vector"},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"switch-to-tidy-evaluation-0-7-0","dir":"Changelog","previous_headings":"","what":"Switch to tidy evaluation","title":"tidyr 0.7.0","text":"tidyr now tidy evaluation grammar. See programming vignette dplyr practical information tidy evaluation. tidyr port bit special. philosophy tidy evaluation R code refer real objects (data frame context), make exceptions rule tidyr. reason several functions accept bare symbols specify names new columns create (gather() prime example). tidy symbol represent actual object. workaround capture arguments using rlang::quo_name() (still support quasiquotation can unquote symbols strings). type NSE now discouraged tidyverse: symbols R code represent real objects. Following switch tidy eval underscored variants softly deprecated. However remain around time without warning backward compatibility.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"switch-to-the-tidyselect-backend-0-7-0","dir":"Changelog","previous_headings":"","what":"Switch to the tidyselect backend","title":"tidyr 0.7.0","text":"selecting backend dplyr extracted standalone package tidyselect tidyr now uses selecting variables. used selecting multiple variables (drop_na()) well single variables (col argument extract() separate(), key value arguments spread()). implies following changes: arguments selecting single variable now support features dplyr::pull(). can supply name position, including negative positions. Multiple variables now selected bit differently. now make strict distinction data context expressions. data expression either bare name expression like x:y c(x, y). data expression, can refer columns data frame. Everything else context expression can refer objects defined <-. can still refer contextual objects data expression explicit. One way explicit unquote variable environment tidy eval operator !!: hand, select helpers like start_with() context expressions. therefore easy refer objects never ambiguous data columns: {r} x <- \"d\" drop_na(df, starts_with(x)) special rules contrast dplyr tidyr verbs (data context scope) make sense selecting functions provide robust helpful semantics.","code":"x <- 2 drop_na(df, 2) # Works fine drop_na(df, x) # Object 'x' not found drop_na(df, !! x) # Works as if you had supplied 2"},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-063","dir":"Changelog","previous_headings":"","what":"tidyr 0.6.3","title":"tidyr 0.6.3","text":"CRAN release: 2017-05-15 Patch tests compatible dev tibble","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-062","dir":"Changelog","previous_headings":"","what":"tidyr 0.6.2","title":"tidyr 0.6.2","text":"CRAN release: 2017-05-04 Register C functions Added package docs Patch tests compatible dev dplyr.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-061","dir":"Changelog","previous_headings":"","what":"tidyr 0.6.1","title":"tidyr 0.6.1","text":"CRAN release: 2017-01-10 Patch test compatible dev tibble Changed deprecation message extract_numeric() point readr::parse_number() rather readr::parse_numeric()","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-060","dir":"Changelog","previous_headings":"","what":"tidyr 0.6.0","title":"tidyr 0.6.0","text":"CRAN release: 2016-08-12","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"api-changes-0-6-0","dir":"Changelog","previous_headings":"","what":"API changes","title":"tidyr 0.6.0","text":"drop_na() removes observations NA given variables. variables given, variables considered (#194, @janschulz). extract_numeric() deprecated (#213). Renamed table4 table5 table4a table4b make connection clear. key value variables table2 renamed type count.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"bug-fixes-and-minor-improvements-0-6-0","dir":"Changelog","previous_headings":"","what":"Bug fixes and minor improvements","title":"tidyr 0.6.0","text":"expand(), crossing(), nesting() now silently drop zero-length inputs. crossing_() nesting_() versions crossing() nesting() take list input. full_seq() works correctly dates date/times.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-051","dir":"Changelog","previous_headings":"","what":"tidyr 0.5.1","title":"tidyr 0.5.1","text":"CRAN release: 2016-06-14 Restored compatibility R < 3.3.0 avoiding getS3method(envir = ) (#205, @krlmlr).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-050","dir":"Changelog","previous_headings":"","what":"tidyr 0.5.0","title":"tidyr 0.5.0","text":"CRAN release: 2016-06-12","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"new-functions-0-5-0","dir":"Changelog","previous_headings":"","what":"New functions","title":"tidyr 0.5.0","text":"separate_rows() separates observations multiple delimited values separate rows (#69, @aaronwolen).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"bug-fixes-and-minor-improvements-0-5-0","dir":"Changelog","previous_headings":"","what":"Bug fixes and minor improvements","title":"tidyr 0.5.0","text":"complete() preserves grouping created dplyr (#168). expand() (hence complete()) preserves ordered attribute factors (#165). full_seq() preserve attributes dates date/times (#156), sequences longer need start 0. gather() can now gather together list columns (#175), gather_.data.frame(na.rm = TRUE) now removes missing values ’re actually present (#173). nest() returns correct output every variable nested (#186). separate() fills right--left (left--right!) fill = “left” (#170, @dgrtwo). separate() unite() now automatically drop removed variables grouping (#159, #177). spread() gains sep argument. -null, name columns “keyvalue”. Additionally, sep NULL missing values converted (#68). spread() works presence list-columns (#199) unnest() works non-syntactic names (#190). unnest() gains sep argument. non-null, rename columns nested data frames include original column name, nested column name, separated .sep (#184). unnest() gains .id argument works way bind_rows(). useful named list data frames vectors (#125). Moved useful sample datasets DSR package. Made compatible dplyr 0.4 0.5. tidyr functions create new columns aggressive re-encoding column names UTF-8.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-041","dir":"Changelog","previous_headings":"","what":"tidyr 0.4.1","title":"tidyr 0.4.1","text":"CRAN release: 2016-02-05 Fixed bug nest() nested data ending wrong row (#158).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-040","dir":"Changelog","previous_headings":"","what":"tidyr 0.4.0","title":"tidyr 0.4.0","text":"CRAN release: 2016-01-18","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"nested-data-frames-0-4-0","dir":"Changelog","previous_headings":"","what":"Nested data frames","title":"tidyr 0.4.0","text":"nest() unnest() overhauled support useful way structuring data frames: nested data frame. grouped data frame, one row per observation, additional metadata define groups. nested data frame, one row per group, individual observations stored column list data frames. useful structure lists objects (like models) one element per group. nest() now produces single list data frames called “data” rather list column variable. Nesting variables included nested data frames. also works grouped data frames made dplyr::group_by(). can override default column name .key. unnest() gains .drop argument controls happens list columns. default, ’re kept output doesn’t require row duplication; otherwise ’re dropped. unnest() now mutate() semantics ... - allows unnest transformed columns easily. (Previously used select semantics).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"expanding-0-4-0","dir":"Changelog","previous_headings":"","what":"Expanding","title":"tidyr 0.4.0","text":"expand() allows evaluate arbitrary expressions like full_seq(year). previously using c() created nested combinations, ’ll now need use nesting() (#85, #121). nesting() crossing() allow create nested crossed data frames individual vectors. crossing() similar base::expand.grid() full_seq(x, period) creates full sequence values min(x) max(x) every period values.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"minor-bug-fixes-and-improvements-0-4-0","dir":"Changelog","previous_headings":"","what":"Minor bug fixes and improvements","title":"tidyr 0.4.0","text":"fill() fills NULLs list-columns. fill() gains direction argument can fill either upwards downwards (#114). gather() now stores key column character, default. revert previous behaviour using factor (allows preserve ordering columns), use key_factor = TRUE (#96). tidyr verbs right thing grouped data frames created group_by() (#122, #129, #81). seq_range() removed. never used announced. spread() creates columns mixed type convert = TRUE (#118, @jennybc). spread() drop = FALSE handles zero-length factors (#56). spread()ing data frame key value columns creates one row output (#41). unite() now removes old columns adding new (#89, @krlmlr). separate() now warns defunct … argument used (#151, @krlmlr).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-031","dir":"Changelog","previous_headings":"","what":"tidyr 0.3.1","title":"tidyr 0.3.1","text":"CRAN release: 2015-09-10 Fixed bug attributes non-gather columns lost (#104)","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-030","dir":"Changelog","previous_headings":"","what":"tidyr 0.3.0","title":"tidyr 0.3.0","text":"CRAN release: 2015-09-08","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"new-features-0-3-0","dir":"Changelog","previous_headings":"","what":"New features","title":"tidyr 0.3.0","text":"New complete() provides wrapper around expand(), left_join() replace_na() common task: completing data frame missing combinations variables. fill() fills missing values column last non-missing value (#4). New replace_na() makes easy replace missing values something meaningful data. nest() complement unnest() (#3). unnest() can now work multiple list-columns time. don’t supply columns names, unlist list-columns (#44). unnest() can also handle columns lists data frames (#58).","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"bug-fixes-and-minor-improvements-0-3-0","dir":"Changelog","previous_headings":"","what":"Bug fixes and minor improvements","title":"tidyr 0.3.0","text":"tidyr longer depends reshape2. fix issues also try load reshape (#88). %>% re-exported magrittr. expand() now supports nesting crossing (see examples details). comes expense creating new variables inline (#46). expand_ SE evaluation correctly can pass character vector columns names (list formulas etc) (#70). extract() 10x faster now uses stringi instead base R regular expressions. also returns NA instead throwing error regular expression doesn’t match (#72). extract() separate() preserve character vectors convert TRUE (#99). internals spread() rewritten, now preserve attributes input value column. means can now spread date (#62) factor (#35) inputs. spread() gives informative error message key value don’t exist input data (#36). separate() displays first 20 failures (#50). finer control happens two matches: can fill missing values either “left” “right” (#49). separate() longer throws error number pieces aren’t expected - instead uses drops extra values fills right gives warning. input NA separate() extract() return silently return NA outputs, rather throwing error. (#77) Experimental unnest() method lists removed.","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"tidyr-020","dir":"Changelog","previous_headings":"","what":"tidyr 0.2.0","title":"tidyr 0.2.0","text":"CRAN release: 2014-12-05","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"new-functions-0-2-0","dir":"Changelog","previous_headings":"","what":"New functions","title":"tidyr 0.2.0","text":"Experimental expand() function (#21). Experiment unnest() function converting named lists data frames. (#3, #22)","code":""},{"path":"https://tidyr.tidyverse.org/dev/news/index.html","id":"bug-fixes-and-minor-improvements-0-2-0","dir":"Changelog","previous_headings":"","what":"Bug fixes and minor improvements","title":"tidyr 0.2.0","text":"extract_numeric() preserves negative signs (#20). gather() better defaults key value supplied. ... omitted, gather() selects columns (#28). Performance now comparable reshape2::melt() (#18). separate() gains extra argument lets control happens extra pieces. default throw “error”, can also “merge” “drop”. spread() gains drop argument, allows preserve missing factor levels (#25). converts factor value variables character vectors, instead embedding matrix inside data frame (#35).","code":""}]