Python实现的文轩网爬虫完整示例

时间：2021-05-22

本文实例讲述了Python实现的文轩网爬虫。分享给大家供大家参考，具体如下：

encoding=utf8import pymysqlimport timeimport sysimport requestsimport os#捕获错误import tracebackimport types#将html实体化import cgiimport warningsreload(sys)sys.setdefaultencoding('utf-8')from pyquery import PyQuery as pqfrom lxml import etreesys.setdefaultencoding('utf-8')#屏蔽错误warnings.filterwarnings("ignore")#下载图片def dowloadPic(imageUrl,filePath):r = requests.get(imageUrl,timeout=60)status=r.status_codeif status == 404:return 404with open(filePath, "wb") as code:code.write(r.content)#根据详情页地址抓取数据并插入数据库def getData(final_url):file_open=open('./url.txt', 'w')file_open.write(final_url)file_open.close()#链接数据库conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='root', db='bookinfo', charset='utf8')#设置浮标cursor = conn.cursor(cursor=pymysql.cursors.DictCursor)#解析详情页面try:detail_url=final_urlc=pq(detail_url)head=c('html').attr('xmlns')err='http:///'h=pq(home_url)#分类导航链接menu=h('.mod-mainmenu').find('dd').find('a').eq(n).attr('href')#print menu#分类书籍首页try:mh=pq(menu)except Exception, e :return 'backs'# text=mh('.main').find('a').text()# text=text.encode("GBK", "ignore");li=[]u=0while u<248 :detail_urls=mh('.main').find('a').eq(u).attr('href')#将取到所有地址放入到列表当中li.append(detail_urls)u+=1#进行列表去重li=list(set(li))for final_url in li:try:result=getData(final_url)except Exception, e :continueif result=='back' :continueprint 'OK,finished'n=0while n<58:while n<58:print nstring=str(n)file_open=open('./number.txt', 'w')file_open.write(string)file_open.close()res=winxuan(n)n+=1if res=='backs' :continue

更多关于Python相关内容可查看本站专题：《Python Socket编程技巧总结》、《Python正则表达式用法总结》、《Python数据结构与算法教程》、《Python函数使用技巧总结》、《Python字符串操作技巧汇总》、《Python入门与进阶经典教程》及《Python文件与目录操作技巧汇总》

希望本文所述对大家Python程序设计有所帮助。

Python实现的文轩网爬虫完整示例

相关文章

python爬虫线程池案例详解(梨视频短视频爬取)

Python下使用Scrapy爬取网页内容的实例

Windows下安装Scrapy

微商平台文轩网上书店怎么样

python爬虫实例详解