You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to propose a functionality that I personally found to be useful. Maybe something that allows me to do this already exists and I just missed it, but if that is not the case, I think it would best fit into the tidyr-package and a possible name is extend().
What it would do is it would duplicate and row-bind the dataframe, but with the identifiers in select columns (and all combinations of these columns, each with its own row-bound duplicate of the original dataframe) being overwritten by a placeholder (e.g. NA or 'all'). This altered dataframe can then be fed to mutate or summarise to calculate the mean or whatever. Different from the usual approach, the result would therefore not only contain rows with the the means of the groups, but also a row with the mean of the entire dataframe (if a single column was extended) or multiple rows with different combinations of column-groupings.
E.g. after extend()-ing it, a dataframe with height of individuals grouped by country and gender could not only be summarized by each country-gender combination but also by gender alone or country alone or gender and country alone or also the entire dataset, all directly fed into a single output. In this simple example the workaround isnt too long, just grouping the dataframe differently and summarizing again, but I worked with a case where I had several of these combinations of columns and grouping and summarizing separately would have taken a lot of space and effort.
I have a fully functional package that is not the most elegant it could be, but I think implements all necessary logic and considers special cases. If there interest in such a functionality?
The core of my function is something like the code below, but there are other things that I added, like allowing to exclude particular combinations of colums or not duplicating certain groups (the columns to be extended cannot be columns used for grouping) when they have only a single unique entry because extending by that column would not be so useful.
I would like to propose a functionality that I personally found to be useful. Maybe something that allows me to do this already exists and I just missed it, but if that is not the case, I think it would best fit into the tidyr-package and a possible name is
extend()
.What it would do is it would duplicate and row-bind the dataframe, but with the identifiers in select columns (and all combinations of these columns, each with its own row-bound duplicate of the original dataframe) being overwritten by a placeholder (e.g.
NA
or'all'
). This altered dataframe can then be fed to mutate or summarise to calculate the mean or whatever. Different from the usual approach, the result would therefore not only contain rows with the the means of the groups, but also a row with the mean of the entire dataframe (if a single column was extended) or multiple rows with different combinations of column-groupings.E.g. after
extend()
-ing it, a dataframe with height of individuals grouped by country and gender could not only be summarized by each country-gender combination but also by gender alone or country alone or gender and country alone or also the entire dataset, all directly fed into a single output. In this simple example the workaround isnt too long, just grouping the dataframe differently and summarizing again, but I worked with a case where I had several of these combinations of columns and grouping and summarizing separately would have taken a lot of space and effort.I have a fully functional package that is not the most elegant it could be, but I think implements all necessary logic and considers special cases. If there interest in such a functionality?
The core of my function is something like the code below, but there are other things that I added, like allowing to exclude particular combinations of colums or not duplicating certain groups (the columns to be extended cannot be columns used for grouping) when they have only a single unique entry because extending by that column would not be so useful.
The text was updated successfully, but these errors were encountered: