python 爬虫请求模块requests详解

时间：2021-05-23

requests

相比urllib，第三方库requests更加简单人性化，是爬虫工作中常用的库

requests安装

初级爬虫的开始主要是使用requests模块
安装requests模块：
Windows系统：
cmd中：

pip install requests

mac系统中：
终端中：

pip3 install requests

requests库的基本使用

import requestsurl = 'https://; c_first_page=https%3A//www.csdn.net/; Hm_lvt_6bcd52f51e9b3dce32bec4a3997715ac=1597245305,1597254589,1597290418,1597378513; c_segment=1; dc_tos=qf1jz2; Hm_lpvt_6bcd52f51e9b3dce32bec4a3997715ac=1597387359'}url = 'https://www.csdn.net/'reponse = requests.get(url,headers=header)#打印文本形式print(reponse.text)

session

session ：通过在服务端记录的信息确定⽤户身份
这⾥这个session就是⼀个指的是会话
会话对象是一种高级的用法，可以跨请求保持某些参数，比如在同一个Session实例之间保存Cookie，像浏览器一样，我们并不需要每次请求Cookie，Session会自动在后续的请求中添加获取的Cookie，这种处理方式在同一站点连续请求中特别方便

处理不信任的SSL证书

什么是SSL证书？
SSL证书是数字证书的⼀种，类似于驾驶证、护照和营业执照的电⼦副本。

因为配置在服务器上，也称为SSL服务器证书。SSL 证书就是遵守 SSL协议，由受信任的数字证书颁发机构CA，在验证服务器身份后颁发，具有服务器身份验证和数据传输加密功能
我们来爬一个证书不太合格的网站

import requestsurl = 'https://inv-veri.chinatax.gov.cn/'resp = requests.get(url)print(resp.text)

它报了一个错

我们来修改一下代码

import requestsurl = 'https://inv-veri.chinatax.gov.cn/'resp = requests.get(url,verify = False)print(resp.text)

我们的代码又能成功爬取了

到此这篇关于python 爬虫请求模块requests的文章就介绍到这了,更多相关python 爬虫requests模块内容请搜索以前的文章或继续浏览下面的相关文章希望大家以后多多支持！

python 爬虫请求模块requests详解

requests

requests安装

requests库的基本使用

session

处理不信任的SSL证书

相关文章

浅析Python requests 模块

Python使用requests及BeautifulSoup构建爬虫实例代码

python爬虫要用到的库总结

Python requests发送post请求的一些疑点

对python中使用requests模块参数编码的不同处理方法