python实现web邮箱扫描的示例(附源码)

时间：2021-05-22

信息收集是进行渗透测试的关键部分，掌握大量的信息对于攻击者来说是一件非常重要的事情，比如，我们知道一个服务器的版本信息，我们就可以利用该服务器框架的相关漏洞对该服务器进行测试。那么如果我们掌握了该服务器的管理员的邮箱地址，我们就可以展开一个钓鱼攻击。所以，对web站点进行邮箱扫描，是进行钓鱼攻击的一种前提条件。

下面，我们利用python脚本来实现一个web站点的邮箱扫描爬取。目的是在实现这个脚本的过程中对python进行学习

最后有完整代码

基本思路

我们向工具传入目标站点之后，首先要对输入进行一个基本的检查和分析，因为我们会可能会传入各种样式的地址，比如http://', path='', query='', fragment='') base_url = '{0.scheme}://{0.netloc}'.format(parts)#scheme：协议；netloc：域名 path = url[:url.rfind('/')+1] if '/' in parts.path else url#提取路径 print('[%d] Processing %s' % (count,url)) try: head = {'User-Agent':"Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; en) Presto/2.8.131 Version/11.11"} response = requests.get(url,headers = head) except(requests.exceptions.MissingSchema,requests.exceptions.ConnectionError): continue new_emails = set(re.findall(r'[a-z0-0\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+', response.text ,re.I))#通过正则表达式从获取的网页中提取邮箱，re.I表示忽略大小写 emails.update(new_emails)#将获取的邮箱地址存在emalis中。

通过锚点进入下一网页继续搜索

soup = BeautifulSoup(response.text, features='lxml') for anchor in soup.find_all('a'): #寻找锚点。在html中，<a>标签代表一个超链接，herf属性就是链接地址 link = anchor.attrs['href'] if 'href' in anchor.attrs else '' #如果，我们找到一个超链接标签，并且该标签有herf属性，那么herf后面的地址就是我们需要锚点链接。 if link.startswith('/'):#如果该链接以/开头，那它只是一个路径，我们就需要加上协议和域名，base_url就是刚才分离出来的协议+域名 link = base_url + link elif not link.startswith('http'):#如果不是以/和http开头的话，就要加上路径。 link =path + link if not link in urls and not link in scraped_urls:#如果该链接在之前没还有被收录的话，就把该链接进行收录。 urls.append(link)except KeyboardInterrupt: print('[+] Closing')for mail in emails: print(mail)

完整代码

from bs4 import BeautifulSoupimport requestsimport requests.exceptionsimport urllib.parsefrom collections import dequeimport reuser_url=str(input('[+] Enter Target URL to Scan:'))urls =deque([user_url])scraped_urls= set()emails = set()count=0try: while len(urls): count += 1 if count ==100: break url = urls.popleft() scraped_urls.add(url) parts = urllib.parse.urlsplit(url) base_url = '{0.scheme}://{0.netloc}'.format(parts) path = url[:url.rfind('/')+1] if '/' in parts.path else url print('[%d] Processing %s' % (count,url)) try: head = {'User-Agent':"Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; en) Presto/2.8.131 Version/11.11"} response = requests.get(url,headers = head) except(requests.exceptions.MissingSchema,requests.exceptions.ConnectionError): continue new_emails = set(re.findall(r'[a-z0-0\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+', response.text ,re.I)) emails.update(new_emails) soup = BeautifulSoup(response.text, features='lxml') for anchor in soup.find_all('a'): link = anchor.attrs['href'] if 'href' in anchor.attrs else '' if link.startswith('/'): link = base_url + link elif not link.startswith('http'): link =path + link if not link in urls and not link in scraped_urls: urls.append(link)except KeyboardInterrupt: print('[+] Closing')for mail in emails: print(mail)

实验………………

以上就是python实现web邮箱扫描的示例(附源码)的详细内容，更多关于python web邮箱扫描的资料请关注其它相关文章！

python实现web邮箱扫描的示例(附源码)

基本思路

通过锚点进入下一网页继续搜索

完整代码

实验………………

相关文章

Bottle框架中的装饰器类和描述符应用详解

jquery自定义插件结合baiduTemplate.js实现异步刷新（附源码）

200行自定义python异步非阻塞Web框架

Python多线程扫描端口代码示例

PHP实现webshell扫描文件木马的方法