You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am finding RcppRoll very convenient to use in conjunction with dplyr with one caveat: if I am doing rolling summaries over a numeric vector which is indexed by date (or a time period), then I may still want this to be used for calculating the rolling window (with value 0). It is analogous to using OLAP functions in SQL with range.
mutate_over() is my most recent attempt at implementing this functionality. @kevinushey@hadley I am wondering whether something like this will sit in RcppRoll or dplyr in the future?
The text was updated successfully, but these errors were encountered:
@kevinushey@hadley on a related note, the user may want to expand the range so that it includes all dates within a given period.
For example, whilst mutate_over() (or another range based function) gives the user a way of calculating a rolling metric over a range of dates they have in their data frame, they may want the other values for e.g. visualisation.
To mitigate against this issue, I have created another function - regularise() - which allows the user to create a data frame with a 'full' index of dates . However, over large data frames it is noticeably slower than the RcppRoll and dplyr API's - I was wondering whether you think it would be possible to rewrite it using Rcpp (no meaningful experience with this)?
If this sounds a bit cryptic, hopefully the following example will illustrate my point:
devtools::install_github("Mullefa/dtsr")
library(dtsr)
library(ggvis)
roll_mean<-function(x, n) {
out<-RcppRoll::roll_mean(x , n)
c(dplyr::cummean(x[seq_len(n-1)]), out)
}
# If the date doesn't appear in the data frame, say no sales occurred on that day.sales_data<- data_frame(
date= seq(as.Date("2014-01-01"), as.Date("2014-12-31"), by=1),
sales= sample(1:1000, length(date), replace=TRUE)
) %>%
sample_n(250) %>%
ts_df(date) %>%
arrange# This graph doesn't visualize the average weekly sales for dates on which there were no sales.sales_data %>%
mutate_over(avg_weekly_sales= roll_mean(sales, n=7)) %>%
ggvis(~date, ~avg_weekly_sales) %>%
layer_bars# Using regularise() followed by mutate(), this graph does.sales_data %>%
regularise %>%
mutate(avg_weekly_sales= roll_mean(sales, n=7)) %>%
ggvis(~date, ~avg_weekly_sales) %>%
layer_bars
I am finding RcppRoll very convenient to use in conjunction with dplyr with one caveat: if I am doing rolling summaries over a numeric vector which is indexed by date (or a time period), then I may still want this to be used for calculating the rolling window (with value 0). It is analogous to using OLAP functions in SQL with
range
.mutate_over() is my most recent attempt at implementing this functionality. @kevinushey @hadley I am wondering whether something like this will sit in RcppRoll or dplyr in the future?
The text was updated successfully, but these errors were encountered: