scrapy spider的几种爬取方式实例代码

时间：2021-05-22

本节课介绍了scrapy的爬虫框架，重点说了scrapy组件spider。

spider的几种爬取方式：

爬取1页内容

按照给定列表拼出链接爬取多页

找到‘下一页'标签进行爬取

进入链接，按照链接进行爬取

下面分别给出了示例

1.爬取1页内容

#by 寒小阳(hanxiaoyang.ml@gmail.com)import scrapyclass JulyeduSpider(scrapy.Spider): name = "julyedu" start_urls = [ 'https:///society_index.shtml'] def parse(self, response): for href in response.xpath('//*[@id="news"]/div/div/div/div/em/a/@href'): full_url = response.urljoin(href.extract()) yield scrapy.Request(full_url, callback=self.parse_question) def parse_question(self, response): print response.xpath('//div[@class="qq_article"]/div/h1/text()').extract_first() print response.xpath('//span[@class="a_time"]/text()').extract_first() print response.xpath('//span[@class="a_catalog"]/a/text()').extract_first() print "\n".join(response.xpath('//div[@id="Cnt-Main-Article-QQ"]/p[@class="text"]/text()').extract()) print "" yield { 'title': response.xpath('//div[@class="qq_article"]/div/h1/text()').extract_first(), 'content': "\n".join(response.xpath('//div[@id="Cnt-Main-Article-QQ"]/p[@class="text"]/text()').extract()), 'time': response.xpath('//span[@class="a_time"]/text()').extract_first(), 'cate': response.xpath('//span[@class="a_catalog"]/a/text()').extract_first(), }

总结

以上就是本文关于scrapy spider的几种爬取方式实例代码的全部内容，希望对大家有所帮助。感兴趣的朋友可以继续参阅本站其他相关专题，如有不足之处，欢迎留言指出。感谢朋友们对本站的支持！

scrapy spider的几种爬取方式实例代码

相关文章

Python爬虫Scrapy框架CrawlSpider原理及使用案例

python 爬取英雄联盟皮肤并下载的示例

Scrapy中如何向Spider传入参数的方法实现

python爬虫scrapy基于CrawlSpider类的全站数据爬取示例解析

Python利用Scrapy框架爬取豆瓣电影示例