Python处理PDF及生成多层PDF实例代码

时间：2021-05-22

Python提供了众多的PDF支持库，本文是在Python3环境下，试用了两个库来完成PDF的生成的功能。PyPDF对于读取PDF支持较好，但是没找到生成多层PDF的方法。Reportlab看起来更成熟，能够利用Canvas很方便的生成多层PDF，这样就能够实现图片扫描上来的内容也可以进行内容搜索的目标。

Reportlab

生成双层PDF

双层PDF应用PDF中的Canvas概念，先画文字，最后将图片画上去，这样就是两层的PDF。import os# import urllib2import timefrom reportlab import platypusfrom reportlab.lib.pagesizes import letterfrom reportlab.lib.units import inchfrom reportlab.platypus import SimpleDocTemplate, Imagefrom reportlab.pdfgen import canvasimage_file = "./42.png"# Use Canvas to generate pdfc = canvas.Canvas('reportlab_canvas.pdf', pagesize=letter)width, height = letterc.setFillColorRGB(0,0.77,0.77)# say hello (note after rotate the y coord needs to be negative!)c.drawString( 3*inch, 3*inch, "Hello World")c.drawImage(image_file, 0 , 0)c.showPage()c.save()

PyPDF2

读取PDF

from PyPDF2 import PdfFileWriter, PdfFileReaderoutput = PdfFileWriter()input1 = PdfFileReader(open("jquery.pdf", "rb"))# print document infoprint(input1.getDocumentInfo())# print how many pages input1 has:print ("pdf_document.pdf has %d pages." % input1.getNumPages())# print page contentpage_content = input1.getPage(0).extractText()print( page_content )# add page 1 from input1 to output document, unchangedoutput.addPage(input1.getPage(0))# add page 2 from input1, but rotated clockwise 90 degreesoutput.addPage(input1.getPage(1).rotateClockwise(90))# finally, write "output" to document-output.pdfoutputStream = open("PyPDF2-output.pdf", "wb")output.write(outputStream)

但是PyPDF获取PDF内容有很多问题，可以看这个问题列表。文档中也有说明。

| extractText(self) | ## | # Locate all text drawing commands, in the order they are provided in the | # content stream, and extract the text. This works well for some PDF | # files, but poorly for others, depending on the generator used. This will | # be refined in the future. Do not rely on the order of text coming out of | # this function, as it will change if this function is made more | # sophisticated. | # | # Stability: Added in v1.7, will exist for all future v1.x releases. May | # be overhauled to provide more ordered text in the future. | # @return a unicode string object

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持。

Python处理PDF及生成多层PDF实例代码

相关文章

PHP中使用Imagick读取pdf并生成png缩略图实例

Python实现简单拆分PDF文件的方法

Python生成pdf文件的方法

将Python字符串生成PDF的实例代码详解

PHP中使用mpdf 导出PDF文件的实现方法