奈寻味导航网 » 文章资讯 » Python正则表达式匹配HTML页面编码

Python正则表达式匹配HTML页面编码

2024-04-02 11:35:09 361

html页面一般都会指定一个编码，如何获取到是处理html页面的第一步，因为错误的编码必然带来后面处理的问题。这里我用python的正则表达式写了个：

importre

a=["<metahttp-equiv="Content-Type"content="text/html;charset=utf-8"/>",
'<metahttp-equiv=Content-Typecontent="text/html;charset=gb2312">',
'<metahttp-equiv="Content-Type"content="text/html;charset=iso-8859-1">',
'<metahttp-equiv="Content-Type"content="text/html;charset=gb2312"/>',
'<metahttp-equiv="content-type"content="text/html;charset=utf-8"/>',
'<metahttp-equiv="Content-Type"content="text/html;charset=gb2312"/>',
'<metahttp-equiv="Content-Type"content="text/html;charset=gb2312"/>'
]



b="<meta[]+http-equiv=["']?content-type["']?[]+content=["']?text/html;[]*charset=([0-9-a-zA-Z]+)["']?"


B=re.compile(b,re.IGNORECASE)


foraxina:
r1=B.search(ax)

ifr1:
printr1.group()
printr1.group(1),len(r1.group())
else:
print'notmatch'

返回顶部
3162201930
czq8825@qq.com

Python正则表达式匹配HTML页面编码

热门推荐

随机推荐