我想计算每个组在 5 天内的累计总和。
df <- data.frame(
date = ymd( c( "2022-01-02","2022-01-03","2022-01-05","2022-01-07","2022-01-11","2022-01-14","2022-01-17","2022-01-18","2022-01-24","2022-01-27","2022-01-01","2022-01-04","2022-01-04","2022-01-08","2022-01-12","2022-01-14","2022-01-19","2022-01-24","2022-01-25","2022-01-28")),
group = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B"),
number = c(10,30,20,50,30,50,40,50,30,50,55,10,30,20,50,30,40,30,40,30))
下面是我的数据框的一小部分示例,包括累积和列应返回的内容。 任何帮助,将不胜感激。谢谢。
date group number cumsum(s)
2022-01-02 A 10 10
2022-01-03 A 30 40
2022-01-05 A 20 60
2022-01-07 A 50 110
2022-01-11 A 30 80
2022-01-14 A 50 80
2022-01-17 A 40 90
2022-01-18 A 50 140
2022-01-24 A 30 30
2022-01-27 A 50 80
2022-01-01 B 55 55
2022-01-04 B 10 65
2022-01-04 B 30 95
2022-01-08 B 20 60
2022-01-12 B 50 70
2022-01-14 B 30 80
2022-01-19 B 40 70
2022-01-24 B 30 70
2022-01-25 B 40 70
2022-01-28 B 30 100
我尝试使用 map() 和 cumsum() 但失败了。
最佳答案
您可以自己连接数据,在data.table
中使用非等值连接,并使用.EACHI
来估计累积和
library(data.table)
df = setDT(df)[, d:=date-5][]
cbind(
df[df,on=.(group,date<=date, date>=d), .(result = sum(number)), .EACHI][, .(group,date,result)],
df[, .(number)]
)
输出:
group date result number
1: A 2022-01-02 10 10
2: A 2022-01-03 40 30
3: A 2022-01-05 60 20
4: A 2022-01-07 110 50
5: A 2022-01-11 80 30
6: A 2022-01-14 80 50
7: A 2022-01-17 90 40
8: A 2022-01-18 140 50
9: A 2022-01-24 30 30
10: A 2022-01-27 80 50
11: B 2022-01-01 55 55
12: B 2022-01-04 95 10
13: B 2022-01-04 95 30
14: B 2022-01-08 60 20
15: B 2022-01-12 70 50
16: B 2022-01-14 80 30
17: B 2022-01-19 70 40
18: B 2022-01-24 70 30
19: B 2022-01-25 70 40
20: B 2022-01-28 100 30
假设我们想对多个移位值执行此操作。另一种方法是获取每个组在整组天数(包括未表示的天数)内的总和,并使用 shift。
library(data.table)
setDT(df)
df = df[df[, data.table(date=seq(min(date), max(date),1)), group], on=.(group,date)][
,cs:=cumsum(fifelse(is.na(number),0,number)),group]
f <- function(s,cs) {
r = cs-shift(cs,s+1)
r[is.na(r)] <- cs[is.na(r)]
r
}
s = c(3,5,7,10)
df[, (paste0("c",s)):=lapply(s, f,cs=cs),group][!is.na(number)]
输出:
date group number cs c3 c5 c7 c10
<Date> <char> <num> <num> <num> <num> <num> <num>
1: 2022-01-02 A 10 10 10 10 10 10
2: 2022-01-03 A 30 40 40 40 40 40
3: 2022-01-05 A 20 60 60 60 60 60
4: 2022-01-07 A 50 110 70 110 110 110
5: 2022-01-11 A 30 140 30 80 100 140
6: 2022-01-14 A 50 190 80 80 130 150
7: 2022-01-17 A 40 230 90 90 120 170
8: 2022-01-18 A 50 280 90 140 170 170
9: 2022-01-24 A 30 310 30 30 120 170
10: 2022-01-27 A 50 360 80 80 80 170
11: 2022-01-01 B 55 55 55 55 55 55
12: 2022-01-04 B 10 65 65 65 65 65
13: 2022-01-04 B 30 95 40 95 95 95
14: 2022-01-08 B 20 115 20 60 60 115
15: 2022-01-12 B 50 165 50 70 70 110
16: 2022-01-14 B 30 195 80 80 100 130
17: 2022-01-19 B 40 235 40 70 120 120
18: 2022-01-24 B 30 265 30 70 70 100
19: 2022-01-25 B 40 305 70 70 110 110
20: 2022-01-28 B 30 335 70 100 100 140
date group number cs c3 c5 c7 c10
https://stackoverflow.com/questions/74254526/