python 文本单词提取和词频统计的实例

时间：2021-05-22

这些对文本的操作经常用到，那我就总结一下。陆续补充。。。

操作：

strip_html(cls, text) 去除html标签

separate_words(cls, text, min_lenth=3) 文本提取

get_words_frequency(cls, words_list) 获取词频

源码：

class DocProcess(object): @classmethod def strip_html(cls, text): """ Delete html tags in text. text is String """ new_text = " " is_html = False for character in text: if character == "<": is_html = True elif character == ">": is_html = False new_text += " " elif is_html is False: new_text += character return new_text @classmethod def separate_words(cls, text, min_lenth=3): """ Separate text into words in list. """ splitter = re.compile("\\W+") return [s.lower() for s in splitter.split(text) if len(s) > min_lenth] @classmethod def get_words_frequency(cls, words_list): """ Get frequency of words in words_list. return a dict. """ num_words = {} for word in words_list: num_words[word] = num_words.get(word, 0) + 1 return num_words

以上这篇python 文本单词提取和词频统计的实例就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持。

声明：本页内容来源网络，仅供用户参考；我单位不保证亦不表示资料全面及准确无误，也不保证亦不表示这些资料为最新信息，如因任何原因，本网内容或者用户因倚赖本网内容造成任何损失或损害，我单位将不会负任何法律责任。如涉及版权问题，请提交至online#300.cn邮箱联系删除。

python 文本单词提取和词频统计的实例

相关文章

python统计文章中单词出现次数实例

Python实现统计英文文章词频的方法分析

python实现统计文本中单词出现的频率详解

python统计文本字符串里单词出现频率的方法

Python统计纯文本文件中英文单词出现个数的方法总结【测试可用】