r - 使用 dplyr::mutate(across()) 将多列应用于自定义函数

df

a = c("aa", "bb", "cc", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb", "cc", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb","bb") 
b = c("aa", "bb", "cc", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb", "cc", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb","bb") 
c = c("aa", "aa", "aa", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb", "cc", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb","bb") 
d = c(1, 1, 2, 2, 3, 3, 1, 1, 1, 1, 1, 1, 2, 2, 3, 3, 1, 1, 1, 1, 1)
df = data.frame(a,b,c,d)

列名:

cols <- c("a","b","c")

功能:

rare_label <- function(x){
  freq = prop.table(table(unlist(x)))
  make_rare = names(freq)[freq < 0.20]
  lapply(x,
         function(x) {
           replace(x, x %in% make_rare, "Rare")
         })}

希望使用 dplyr::mutate(across()) 评估 a、b、c 中所有值组合的比例,然后用比例更改任何类别低于 20% 为“稀有”。

输出:

     a    b    c
    Rare Rare Rare
    bb   bb   Rare
    cc   cc   Rare
    bb   bb   bb
    bb   bb   bb
    cc   cc   cc
    bb   bb   bb
    .    .    .
    .    .    .
    .    .    .
    

使用下面的代码会引发错误,我不确定原因。

df %<>%
  mutate(across(where(cols), ~rare_label(.)

Error: unexpected symbol in: " mutate(across(where(cols), ~rare_label(.) View"

最佳答案

一个选项可能是:

df %>%
 mutate(across(all_of(cols), 
               ~ replace(., . %in% names(which(prop.table(table(.)) < 0.20)), "rare")))

      a    b    c d
1  rare rare rare 1
2    bb   bb rare 1
3    cc   cc rare 2
4    bb   bb   bb 2
5    bb   bb   bb 3
6    cc   cc   cc 3
7    bb   bb   bb 1
8    bb   bb   bb 1
9    cc   cc   cc 1
10   cc   cc   cc 1

如果要应用现有函数:

fun <- function(x) replace(x, x %in% names(which(prop.table(table(x)) < 0.20)), "rare")

df %>%
 mutate(across(all_of(cols), fun))

https://stackoverflow.com/questions/63731527/

相关文章:

python - 在列表字典中查找最大列表范围的更好(更简洁)方法是什么

python - Python 中的日期字符串格式化

c - main的地址是什么?

vue.js - 将插槽从 Vue 2 迁移到 Vue 3

r - 如何通过提取将列拆分为两列?

angular - 使用@input 对 Angular 组件进行单元测试

docker - 如何查看在我的 Google Cloud Platform Cloud Run 服

html - 轮播滑动动画不适用于 Bootstrap 4.5.2

python - 类型错误 : request() missing 1 required posit

r - separate_rows 在结果周围生成引号