python爬取网页内容转换为PDF文件

707人浏览 / 0人评论

本文实例为大家分享了python爬取网页内容转换为PDF的具体代码，供大家参考，具体内容如下

将廖雪峰的学习教程转换成PDF文件，代码只适合该网站，如果需要其他网站的教程，可靠需要进行稍微的修改。

# coding=utf-8 
import os 
import re 
import time 
import pdfkit 
import requests 
from bs4 import BeautifulSoup 
from PyPDF2 import PdfFileMerger
import sys
reload(sys)
sys.setdefaultencoding('utf8')

html_template = """ 
 
 
 
   
 
 
{content} 
 
 

""" 

#----------------------------------------------------------------------
def parse_url_to_html(url, name): 
  """ 
  解析URL，返回HTML内容 
  :param url:解析的url 
  :param name: 保存的html文件名 
  :return: html 
  """ 
  try: 
    response = requests.get(url) 
    soup = BeautifulSoup(response.content, 'html.parser') 
    # 正文 
    body = soup.find_all(class_="x-wiki-content")[0] 
    # 标题 
    title = soup.find('h4').get_text() 

    # 标题加入到正文的最前面，居中显示 
    center_tag = soup.new_tag("center") 
    title_tag = soup.new_tag('h1') 
    title_tag.string = title 
    center_tag.insert(1, title_tag) 
    body.insert(1, center_tag) 
    html = str(body) 
    # body中的img标签的src相对路径的改成绝对路径 
    pattern = "(

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持脚本之家。

您可能感兴趣的文章: