假设我有一个向量如下:
patient_condition <- c("Pre_P1","Post_P1","Enriched_Post_P1","Post_P1_2","Pre_P2","Post_P2", "P3_Pre")
to_match <- c("P1","P2","P3")
我想创建另一个向量,如果新向量是子字符串,则它只包含 to_match 中的值。
[1] "P1" "P1" "P1" "P1" "P2" "P2" "P3"
感谢任何帮助。谢谢!
最佳答案
我们可以使用
stringr::str_extract(patient_condition, "P[0-9]+")
#[1] "P1" "P1" "P1" "P1" "P2" "P2" "P3"
杂项回复
In my case, this answer works. but I guess the question I ask is extracting substrings from a vector given some values to match. Meaning this answer won't work if I want to extract characters (i.e. Pre, Post, Enriched, etc)
to_match <- c("Pre", "Post", "Enriched")
在那种情况下,我们可以使用
## R-level loop through `to_match`
tmp <- t(sapply(to_match, stringr::str_extract, string = patient_condition))
tmp[!is.na(tmp)]
#[1] "Pre" "Post" "Enriched" "Post" "Pre" "Post" "Pre"
或
## convert multiple matches to REGEX "or" operation `|`
stringr::str_extract(patient_condition, paste0(to_match, collapse = "|"))
#[1] "Pre" "Post" "Enriched" "Post" "Pre" "Post" "Pre"
ThomasIsCoding's answer使用 gregexpr
+ regmatches
也是一个不错的选择。
请注意,这是在执行精确 子字符串匹配。
https://stackoverflow.com/questions/72957686/