我有一个包含三个标识符的数据集。假设一个识别国家,另一个识别时间,另一个识别人。就像下面的数据框:
country person time
1 A John 1
2 A John 2
3 A John 3
4 A Peter 1
5 A Peter 2
6 A Peter 3
7 B David 1
8 B Thomas 2
9 B David 3
10 B Adam 1
11 B Adam 2
12 B Thomas 3
我如何创建一个变量来生成一系列字母,这些字母将按国家/地区识别每个人?输出应如下所示:
country person time Letterseq
1 A John 1 A
2 A John 2 A
3 A John 3 A
4 A Peter 1 B
5 A Peter 2 B
6 A Peter 3 B
7 B David 1 A
8 B Thomas 2 B
9 B David 3 A
10 B Adam 1 C
11 B Adam 2 C
12 B Thomas 3 B
如果您需要更多说明,请告诉我。
最佳答案
如果每个“国家”的唯一人员长度小于 26,则按“国家”分组,通过将“人员”与 unique
匹配
来获取数字索引'person' 的值,使用索引从内置向量 'LETTERS' 返回相应的值
library(dplyr)
df1 <- df1 %>%
group_by(country) %>%
mutate(Letterseq = LETTERS[match(person, unique(person))]) %>%
ungroup
-输出
df1
# A tibble: 12 × 4
country person time Letterseq
<chr> <chr> <int> <chr>
1 A John 1 A
2 A John 2 A
3 A John 3 A
4 A Peter 1 B
5 A Peter 2 B
6 A Peter 3 B
7 B David 1 A
8 B Thomas 2 B
9 B David 3 A
10 B Adam 1 C
11 B Adam 2 C
12 B Thomas 3 B
df1 <- structure(list(country = c("A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), person = c("John", "John", "John",
"Peter", "Peter", "Peter", "David", "Thomas", "David", "Adam",
"Adam", "Thomas"), time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12"))
https://stackoverflow.com/questions/74102204/