我有两个数据框。第一个是引用 df,看起来像这样:
第二个是我的宇宙:
我想合并它们,并只采用宇宙中所有可能对的引用数据框中存在的唯一组合。
理想情况下,我希望报告的数据框看起来像这样:
library(tidyverse)
var1 = c("a","b","c","d")
var2 = c("e","z","f","h")
ref = tibble(var1,var2);ref
sym1 = c("e","b","b","f","n","n","k")
sym2 = c("a","f","z","c","s","k","l")
univ = tibble(sym1,sym2);univ
我如何使用 dplyr 在 R 中执行此操作?
最佳答案
在基础 R 中:
s <- apply(univ, 1, \(x) paste(sort(x), collapse = " "))
r <- paste(ref$var1, ref$var2)
univ[match(r, s), ]
sym1 sym2
1 e a
2 b z
3 f c
4 NA NA
在一个 tidyverse 友好的管道中:
library(stringr)
library(dplyr)
univ %>%
rowwise() %>%
mutate(s = str_c(sort(c_across(everything())), collapse = "")) %>%
pull(s) %>%
match(str_c(ref$var1, ref$var2), .) %>%
univ[., ]
https://stackoverflow.com/questions/74233237/