r - 在R中按组计算每两行值的比例

我有这个数据集



df <- tibble(id, event, duration)

我需要使用后续的“表面”计算每个“潜水”行的表面持续时间比例,并将结果插入新列。所有这些都由“id”分隔。

比例=水面/潜水+水面

#Output dataframe

# A tibble: 8 x 4
  id    event   duration proportion    
1 A     surface       56 x         
2 A     surface       96 x         
3 A     surface       14 x         
4 A     surface       77 x         
5 B     surface       28 x         
6 B     surface       63 x         
7 B     surface       47 x         
8 B     surface       90 x   

############################################################

编辑:

在我的原始数据中,我有一些没有“表面”的“潜水”,并且创建的代码有错误。

Error in `dplyr::mutate()`:
! Problem while computing `proportion = DurationMin[What ==
  "Surface"]/sum(DurationMin)`.
✖ `proportion` must be size 2 or 1, not 0.
ℹ The error occurred in group 2803: ptt = "2017111870", grp = 1015.

在“id”中会有奇数行,其中“潜水”事件在其序列中不会有“表面”。所以我需要每次遇到未配对的事件时,要么忽略它,要么插入一个 NA。可能吗?

按照这个数据框示例:


id <- c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B")

event <- c("dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive")

duration <- c(55, 56, 40, 96, 58, 14, 43, 77, 19, 28, 34, 63, 29, 47, 61)

df <- tibble(id, event, duration)

> df
   id   event duration
1   A    dive       55
2   A surface       56
3   A    dive       40
4   A surface       96
5   A    dive       58
6   A surface       14
7   A    dive       43
8   A surface       77
9   B    dive       19
10  B surface       28
11  B    dive       34
12  B surface       63
13  B    dive       29
14  B surface       47
15  B    dive       61
16  B    dive       45
17  B surface       30
> 

最佳答案

我们可以使用 gl 每 2 行创建一次分组索引,然后通过划分 'duration' 来创建列 'proportion',其中事件值为 'surface' (event = = 'surface') 与 'duration' 的 sum

library(dplyr)
df %>%
   group_by(id) %>%
   group_by(grp = as.integer(gl(n(), 2, n())), .add = TRUE) %>% 
   mutate(proportion = duration[event == 'surface'][1]/sum(duration)) %>%
   ungroup %>%
   select(-grp)

-输出

# A tibble: 16 × 4
   id    event   duration proportion
   <chr> <chr>      <dbl>      <dbl>
 1 A     dive          55      0.505
 2 A     surface       56      0.505
 3 A     dive          40      0.706
 4 A     surface       96      0.706
 5 A     dive          58      0.194
 6 A     surface       14      0.194
 7 A     dive          43      0.642
 8 A     surface       77      0.642
 9 B     dive          19      0.596
10 B     surface       28      0.596
11 B     dive          34      0.649
12 B     surface       63      0.649
13 B     dive          29      0.618
14 B     surface       47      0.618
15 B     dive          61      0.596
16 B     surface       90      0.596

对于新的数据集,我们可以使用

df %>% 
  group_by(id) %>% 
  group_by(grp = cumsum(event == 'dive'), .add = TRUE) %>% 
  mutate(proportion = duration[event == 'surface'][1]/sum(duration)) %>% 
  ungroup %>%
  select(-grp)

-输出

# A tibble: 17 × 4
   id    event   duration proportion
   <chr> <chr>      <int>      <dbl>
 1 A     dive          55      0.505
 2 A     surface       56      0.505
 3 A     dive          40      0.706
 4 A     surface       96      0.706
 5 A     dive          58      0.194
 6 A     surface       14      0.194
 7 A     dive          43      0.642
 8 A     surface       77      0.642
 9 B     dive          19      0.596
10 B     surface       28      0.596
11 B     dive          34      0.649
12 B     surface       63      0.649
13 B     dive          29      0.618
14 B     surface       47      0.618
15 B     dive          61     NA    
16 B     dive          45      0.4  
17 B     surface       30      0.4  

数据

df <- structure(list(id = c("A", "A", "A", "A", "A", "A", "A", "A", 
"B", "B", "B", "B", "B", "B", "B", "B", "B"), event = c("dive", 
"surface", "dive", "surface", "dive", "surface", "dive", "surface", 
"dive", "surface", "dive", "surface", "dive", "surface", "dive", 
"dive", "surface"), duration = c(55L, 56L, 40L, 96L, 58L, 14L, 
43L, 77L, 19L, 28L, 34L, 63L, 29L, 47L, 61L, 45L, 30L)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17"))

https://stackoverflow.com/questions/72860923/

相关文章:

c - 格式说明符 : %u vs %d in C

r - 如何在 r 的基函数中使用 tidyeval

python - 将用户数据列表转换为对象的最佳方法是什么?

javascript - 用 JavaScript 数组中的空值替换空值/未定义值

java - ASM 和 Javaagent 字节码检测 : ClassFormatError: S

android - 如何在 Android Jetpack Compose 的 TextField

types - Julia 函数类型注解

javascript - 如何声明 useState() 初始值为 null,然后再给它一个对象值?

r - 混合 glm 零膨胀模型的 Bootstrap 方法

excel - 分配给数组