使用selenium抓取淘宝的商品信息实例

时间：2021-05-25

淘宝的页面大量使用了js加载数据，所以采用selenium来进行爬取更为简单，selenum作为一个测试工具，主要配合无窗口浏览器phantomjs来使用。

import refrom selenium import webdriverfrom selenium.common.exceptions import TimeoutExceptionfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECfrom pyquery import PyQuery as pq'''wait.until()语句是selenum里面的显示等待，wait是一个WebDriverWait对象，它设置了等待时间，如果页面在等待时间内没有在 DOM中找到元素，将继续等待，超出设定时间后则抛出找不到元素的异常,也可以说程序每隔xx秒看一眼，如果条件成立了，则执行下一步，否则继续等待，直到超过设置的最长时间，然后抛出TimeoutException1.presence_of_element_located 元素加载出，传入定位元组，如(By.ID, 'p')2.element_to_be_clickable 元素可点击3.text_to_be_present_in_element 某个元素文本包含某文字'''# 定义一个无界面的浏览器browser = webdriver.PhantomJS( service_args=[ '--load-images=false', '--disk-cache=true'])# 10s无响应就down掉wait = WebDriverWait(browser, 10)#虽然无界面但是必须要定义窗口browser.set_window_size(1400, 900)def search(): ''' 此函数的作用为完成首页点击搜索的功能，替换标签可用于其他网页使用 :return: ''' print('正在搜索') try: #访问页面 browser.get('https://pile('(\d+)').search(total).group(1)) #只要后面还有就继续爬，继续翻页 for i in range(2, total + 1): next_page(i) except Exception: print('出错啦') finally: #关闭浏览器 browser.close()if __name__ == '__main__': main()

以上这篇使用selenium抓取淘宝的商品信息实例就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持。

使用selenium抓取淘宝的商品信息实例

相关文章

Python进阶之使用selenium爬取淘宝商品信息功能示例

Python使用Selenium模块实现模拟浏览器抓取淘宝商品美食信息功能示例

Python使用Selenium模块模拟浏览器抓取斗鱼直播间信息示例

PHP实现小偷程序实例

Python使用Selenium+BeautifulSoup爬取淘宝搜索页