奈寻味导航网 » 文章资讯 » python 文本单词提取和词频统计的实例

python 文本单词提取和词频统计的实例

2023-09-02 14:25:04 366

这些对文本的操作经常用到，那我就总结一下。陆续补充。。。

操作：

strip_html(cls,text)去除html标签

separate_words(cls,text,min_lenth=3)文本提取

get_words_frequency(cls,words_list)获取词频

源码：

classDocProcess(object):

@classmethod
defstrip_html(cls,text):
"""
Deletehtmltagsintext.
textisString
"""
new_text=""
is_html=False
forcharacterintext:
ifcharacter=="<":
is_html=True
elifcharacter==">":
is_html=False
new_text+=""
elifis_htmlisFalse:
new_text+=character
returnnew_text

@classmethod
defseparate_words(cls,text,min_lenth=3):
"""
Separatetextintowordsinlist.
"""
splitter=re.compile("\\W+")
return[s.lower()forsinsplitter.split(text)iflen(s)>min_lenth]

@classmethod
defget_words_frequency(cls,words_list):
"""
Getfrequencyofwordsinwords_list.
returnadict.
"""
num_words={}
forwordinwords_list:
num_words[word]=num_words.get(word,0)+1
returnnum_words

以上这篇python文本单词提取和词频统计的实例就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持毛票票。

返回顶部
3162201930
czq8825@qq.com

python 文本单词提取和词频统计的实例

热门推荐

随机推荐