Python使用urllib2模块抓取HTML页面资源的实例分享

时间：2021-05-22

先把要抓取的网络地址列在单独的list文件中

https://www.jb51.net/article/83440.htmlhttps://www.jb51.net/article/83437.htmlhttps://www.jb51.net/article/83430.htmlhttps://www.jb51.net/article/83449.html

然后我们来看程序操作，代码如下：

#!/usr/bin/pythonimport osimport sysimport urllib2import redef Cdown_data(fileurl, fpath, dpath): if not os.path.exists(dpath): os.makedirs(dpath) try: getfile = urllib2.urlopen(fileurl) data = getfile.read() f = open(fpath, 'w') f.write(data) f.close() except: print with open('u1.list') as lines: for line in lines: URI = line.strip() if '?' and '%' in URI: continue elif URI.count('/') == 2: continue elif URI.count('/') > 2: #print URI,URI.count('/') try: dirpath = URI.rpartition('/')[0].split('//')[1] #filepath = URI.split('//')[1].split('/')[1] filepath = URI.split('//')[1] if filepath: print URI,filepath,dirpath Cdown_data(URI, filepath, dirpath) except: print URI,'error'

声明：本页内容来源网络，仅供用户参考；我单位不保证亦不表示资料全面及准确无误，也不保证亦不表示这些资料为最新信息，如因任何原因，本网内容或者用户因倚赖本网内容造成任何损失或损害，我单位将不会负任何法律责任。如涉及版权问题，请提交至online#300.cn邮箱联系删除。

Python使用urllib2模块抓取HTML页面资源的实例分享

相关文章

Python中使用urllib2模块编写爬虫的简单上手示例

python使用自定义user-agent抓取网页的方法

python 网络爬虫初级实现代码

Python爬虫 urllib2的使用方法详解

Python实现抓取城市的PM2.5浓度和排名