我的代碼:
# -*- coding: utf-8 -*-
import requests
from time import ctime
from lxml import etree
from bs4 import BeautifulSoup
url = 'http://www.cnblogs.com/descusr/archive/2012/06/20/2557075.html'
tries = 300
web_data = requests.get(url).text
# step 1
print('lxml start at:', ctime())
while tries > 0:
lxml_page = etree.HTML(web_data)
tries = tries - 1
print('lxml done at:', ctime())
# step 2
print('soup start at:', ctime())
while tries > 0:
soup_page = BeautifulSoup(web_data, 'lxml')
tries = tries - 1
print('soup done at:', ctime())
我是分步運(yùn)行的:先注釋掉step2,運(yùn)行step1辣恋;之后注釋掉1,運(yùn)行2模软。新手輕拍
運(yùn)行結(jié)果:
解析一個博客頁面300次伟骨,Beautiful用了約8秒,lxml用了約1秒