我有一个大数据框,其中一些列包含长串的逗号分隔的长度不等的数字数据,这里是列 A
、B
和 C
:
df <- data.frame(
id = 1:3,
A = c("200, 100, 80, 100","120, 210, 220", "170, 200"),
B = c("0.1, 0.2, 0.3","0.2, 0.3, 1.0, 0.4, 0.9", "0.55, 0.77, 0.99, 0.35"),
C = c("700.1, 701.0, 699.2", "702.5, 702.9", "705.4, 705.4, 706.0")
)
我需要为 A
、B
和 C
中的这些数值数据计算百分比变化。我认为,为了促进这一点,我需要使用 separate_rows
将每个数字分隔到它自己的行中。 但是我如何一次性对所有三列 A
、B
和 C
执行此步骤?
我所能做的就是逐列进行 - 首先是 A
,然后是 B
,最后是 C
:
library(tidyverse)
df %>%
# Step 1 - column `A`:
separate_rows(A, sep = ",", convert = TRUE) %>%
mutate(A_0 = lag((lead(A)-A)/A*100)) %>%
group_by(id) %>%
summarise(across(c(B,C), first),
A = paste0(A, collapse = ", "),
A_0 = paste0(A_0, collapse = ", ")
) %>%
ungroup() %>%
# Step 2 - column `B`:
separate_rows(B, sep = ",", convert = TRUE) %>%
mutate(B_0 = lag((lead(B)-B)/B*100)) %>%
group_by(id) %>%
summarise(across(c(A,A_0,C), first),
B = paste0(B, collapse = ", "),
B_0 = paste0(B_0, collapse = ", ")
) %>%
ungroup() %>%
# Step 3 - column `C`:
separate_rows(C, sep = ",", convert = TRUE) %>%
mutate(C_0 = lag((lead(C)-C)/C*100)) %>%
group_by(id) %>%
summarise(across(c(A,A_0,B,B_0), first),
C = paste0(C, collapse = ", "),
C_0 = paste0(C_0, collapse = ", ")
)
# A tibble: 3 × 7
id A A_0 B B_0 C C_0
<int> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 200, 100, 80, 100 NA, -50, -20, 25 0.1, 0.… NA, 100, 50 700.1… NA, 0.12…
2 2 120, 210, 220 20, 75, 4.76190476190476 0.2, 0.… -33.33333333… 702.5… 0.471967…
3 3 170, 200 -22.7272727272727, 17.6470588235294 0.55, 0… -38.88888888… 705.4… 0.355669…
有没有更好的办法?
最佳答案
我们可以循环 across
列,在 处拆分,
后跟一个或多个空格 (\\s+
),循环遍历list
与map
,转换为numeric
,得到lead
之差的lag
> 和当前值比例,pasted
(toString
) 作为字符向量 (_chr
) 返回,如果需要,在 中对列进行排序>选择
library(dplyr)
library(purrr)
df %>%
mutate(across(A:C, ~ {
map_chr(strsplit(.x, ",\\s+"), ~ {
tmp <- as.numeric(.x)
toString(lag((lead(tmp)- tmp)/tmp *100))})
}, .names = "{.col}_0")) %>%
select(id, gtools::mixedsort(names(.)[-1]))
-输出
id A A_0 B B_0 C
1 1 200, 100, 80, 100 NA, -50, -20, 25 0.1, 0.2, 0.3 NA, 100, 50 700.1, 701.0, 699.2
2 2 120, 210, 220 NA, 75, 4.76190476190476 0.2, 0.3, 1.0, 0.4, 0.9 NA, 50, 233.333333333333, -60, 125 702.5, 702.9
3 3 170, 200 NA, 17.6470588235294 0.55, 0.77, 0.99, 0.35 NA, 40, 28.5714285714286, -64.6464646464647 705.4, 705.4, 706.0
C_0
1 NA, 0.128553063848018, -0.256776034236798
2 NA, 0.0569395017793562
3 NA, 0, 0.0850581230507546
https://stackoverflow.com/questions/71651525/