我有一个正则表达式问题。
我在下面有一个文件列表。
df <- c("Alilis CELF-4_CF_Data_Entry.xlsx" , "Ana T. CELF-4_CF_Data_Entry.xlsx" , "Ana V. CELF-4_CF_Data_Entry.xlsx","Anita CELF-4_CF_Data_Entry.xlsx")
[1] "Alilis CELF-4_CF_Data_Entry.xlsx" "Ana T. CELF-4_CF_Data_Entry.xlsx" "Ana V. CELF-4_CF_Data_Entry.xlsx" "Anita CELF-4_CF_Data_Entry.xlsx"
我需要提取字符串开头的名称,但有一个带点的短字母(例如 Ana V.
)我无法提取这些字母。
使用下面的代码,
unique(word(df, 1))
[1] "Alilis" "Ana" "Anita"
我怎样才能得到?
[1] "Alilis" "Ana T." "Ana V." "Anita"
最佳答案
尝试
gsub("^((\\S+)|^(\\w+ [A-Z]\\.))\\s+.*", "\\1", df)
[1] "Alilis" "Ana T." "Ana V." "Anita"
如果有多个空格也应该有效
> gsub("^((\\S+)|^(\\w+ [A-Z]\\.))\\s+.*", "\\1", c(df, "Allis hello CELF-4_Data_Entry.xlsx"))
[1] "Alilis" "Ana T." "Ana V." "Anita" "Allis"
https://stackoverflow.com/questions/73831057/