如何在R中的字符串向量中找到相似的词?
有时字符串向量中的字符串有拼写错误,我们希望提取相似的单词以避免这种拼写错误,因为相似的单词可能表示单词的正确和不正确形式。这可以通过使用agrep和lapply函数来实现。
在线示例1
x1<-c("India","United Kingdoms","Indiaa","Egyypt","United
Kingdom","Turkey","Egypt","Belaarus","Belarus")
lapply(x1,agrep,x1,value=TRUE)输出结果
[[1]] [1] "India" "Indiaa" [[2]] [1] "United Kingdoms" "United Kingdom" [[3]] [1] "India" "Indiaa" [[4]] [1] "Egyypt" "Egypt" [[5]] [1] "United Kingdoms" "United Kingdom" [[6]] [1] "Turkey" [[7]] [1] "Egyypt" "Egypt" [[8]] [1] "Belaarus" "Belarus" [[9]] [1] "Belaarus" "Belarus"
在线示例2
x2<-c("Alhadi","Umair","Omar","Alhadi","Shanti","Shant","Umaer","Peter","Rahul","Pattrick","P
eeter","Rahuls")
lapply(x2,agrep,x2,value=TRUE)输出结果
[[1]] [1] "Al-hadi" "Alhadi" [[2]] [1] "Umair" "Umaer" [[3]] [1] "Omar" [[4]] [1] "Al-hadi" "Alhadi" [[5]] [1] "Shanti" "Shant" [[6]] [1] "Shanti" "Shant" [[7]] [1] "Umair" "Umaer" [[8]] [1] "Peter" "Peeter" [[9]] [1] "Rahul" "Rahuls" [[10]] [1] "Pattrick" [[11]] [1] "Peter" "Peeter" [[12]] [1] "Rahul" "Rahuls"
在线示例3
x3<-c("Alabamaa","New Yorky","New
Yok","Alabma","Florida","Illinois","Texas","Illinoise")
lapply(x3,agrep,x3,value=TRUE)输出结果
[[1]] [1] "Alabamaa" [[2]] [1] "New Yorky" [[3]] [1] "New Yorky" "New Yok" [[4]] [1] "Alabamaa" "Alabma" [[5]] [1] "Florida" [[6]] [1] "Illinois" "Illinoise" [[7]] [1] "Texas" [[8]] [1] "Illinois" "Illinoise"
热门推荐
10 硕士毕业蛋糕祝福语简短
11 爸爸爱你祝福语大全简短
12 奥运早安祝福语简短英文
13 姥爷生日诗句祝福语简短
14 赞美老师祝福语英文简短
15 年前幸福祝福语大全简短
16 新年酒席开席祝福语简短
17 娘家妈妈新婚祝福语简短
18 送女士祝福语长辈简短