r - 与 R 中的日期相关联的条件累积和

我想计算每个组在 5 天内的累计总和。

df <- data.frame(
  date = ymd( c( "2022-01-02","2022-01-03","2022-01-05","2022-01-07","2022-01-11","2022-01-14","2022-01-17","2022-01-18","2022-01-24","2022-01-27","2022-01-01","2022-01-04","2022-01-04","2022-01-08","2022-01-12","2022-01-14","2022-01-19","2022-01-24","2022-01-25","2022-01-28")),
  group = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B"),
  number = c(10,30,20,50,30,50,40,50,30,50,55,10,30,20,50,30,40,30,40,30))

下面是我的数据框的一小部分示例,包括累积和列应返回的内容。 任何帮助,将不胜感激。谢谢。

date       group number cumsum(s)
2022-01-02 A     10     10
2022-01-03 A     30     40
2022-01-05 A     20     60
2022-01-07 A     50     110
2022-01-11 A     30     80
2022-01-14 A     50     80
2022-01-17 A     40     90
2022-01-18 A     50     140
2022-01-24 A     30     30
2022-01-27 A     50     80
2022-01-01 B     55     55
2022-01-04 B     10     65
2022-01-04 B     30     95
2022-01-08 B     20     60
2022-01-12 B     50     70
2022-01-14 B     30     80
2022-01-19 B     40     70
2022-01-24 B     30     70
2022-01-25 B     40     70
2022-01-28 B     30     100

我尝试使用 map() 和 cumsum() 但失败了。

最佳答案

您可以自己连接数据,在data.table 中使用非等值连接,并使用.EACHI 来估计累积和

library(data.table)

df = setDT(df)[, d:=date-5][]

cbind(
  df[df,on=.(group,date<=date, date>=d), .(result = sum(number)), .EACHI][, .(group,date,result)],
  df[, .(number)]
)

输出:

    group       date result number
 1:     A 2022-01-02     10     10
 2:     A 2022-01-03     40     30
 3:     A 2022-01-05     60     20
 4:     A 2022-01-07    110     50
 5:     A 2022-01-11     80     30
 6:     A 2022-01-14     80     50
 7:     A 2022-01-17     90     40
 8:     A 2022-01-18    140     50
 9:     A 2022-01-24     30     30
10:     A 2022-01-27     80     50
11:     B 2022-01-01     55     55
12:     B 2022-01-04     95     10
13:     B 2022-01-04     95     30
14:     B 2022-01-08     60     20
15:     B 2022-01-12     70     50
16:     B 2022-01-14     80     30
17:     B 2022-01-19     70     40
18:     B 2022-01-24     70     30
19:     B 2022-01-25     70     40
20:     B 2022-01-28    100     30

更新 - 替代方法

假设我们想对多个移位值执行此操作。另一种方法是获取每个组在整组天数(包括未表示的天数)内的总和,并使用 shift。

  1. 加载 data.table 并使用 `setDT() 设置表格
library(data.table)
setDT(df)
  1. 将每个组扩展到最小和最大日期,并获得每组的总和
df = df[df[, data.table(date=seq(min(date), max(date),1)), group], on=.(group,date)][
  ,cs:=cumsum(fifelse(is.na(number),0,number)),group]
  1. 现在,创建一个辅助函数 f,它接受一个移位值(例如 5 或 10)并返回该移位的累积和
f <- function(s,cs) {
  r = cs-shift(cs,s+1)
  r[is.na(r)] <- cs[is.na(r)]
  r
}
  1. 设置一个包含所需偏移的向量,并将该函数应用于每个偏移值
s = c(3,5,7,10)
df[, (paste0("c",s)):=lapply(s, f,cs=cs),group][!is.na(number)]

输出:

          date  group number    cs    c3    c5    c7   c10
        <Date> <char>  <num> <num> <num> <num> <num> <num>
 1: 2022-01-02      A     10    10    10    10    10    10
 2: 2022-01-03      A     30    40    40    40    40    40
 3: 2022-01-05      A     20    60    60    60    60    60
 4: 2022-01-07      A     50   110    70   110   110   110
 5: 2022-01-11      A     30   140    30    80   100   140
 6: 2022-01-14      A     50   190    80    80   130   150
 7: 2022-01-17      A     40   230    90    90   120   170
 8: 2022-01-18      A     50   280    90   140   170   170
 9: 2022-01-24      A     30   310    30    30   120   170
10: 2022-01-27      A     50   360    80    80    80   170
11: 2022-01-01      B     55    55    55    55    55    55
12: 2022-01-04      B     10    65    65    65    65    65
13: 2022-01-04      B     30    95    40    95    95    95
14: 2022-01-08      B     20   115    20    60    60   115
15: 2022-01-12      B     50   165    50    70    70   110
16: 2022-01-14      B     30   195    80    80   100   130
17: 2022-01-19      B     40   235    40    70   120   120
18: 2022-01-24      B     30   265    30    70    70   100
19: 2022-01-25      B     40   305    70    70   110   110
20: 2022-01-28      B     30   335    70   100   100   140
          date  group number    cs    c3    c5    c7   c10

https://stackoverflow.com/questions/74254526/

相关文章:

latex - LaTeX 环境中的多个命令

performance - Oracle/PLSQL 性能

user-interface - 愚蠢且令人沮丧的跨浏览器 UI 问题

c++ - 使用 enable_if 对类方法进行部分模板特化

c++ - 为什么即使还有弱指针,make_shared 也会调用析构函数?

java - 流迭代不使用最后一个值

.net - 如何避免 .NET RegEx 类中的无限循环?

java - @EnableGlobalMethodSecurity 在新的 spring boot

.net - 投票 : What do you call your business layer b

nlp - 自然语言中的范围歧义