- 使用python包管理工具安裝BeautifulSoup
pip install beautifulsoup4
- 新建2.py文件,將以下代碼拷貝到文件中
#!/usr/bin/python
# -*- coding: UTF-8 -*-
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.toutiao.com")
bsObj = BeautifulSoup(html.read(), 'lxml')
print(bsObj.title)
-
運(yùn)行python 3.py逗柴,可以看到打印出了 “<title>今日頭條</title>”,這樣就成功拿到了頁(yè)面的標(biāo)題
添加異常處理
將2.py中的代碼替換為以下代碼
#!/usr/bin/python
# -*- coding: UTF-8 -*-
from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
def getTitle(url):
try:
html = urlopen(url)
except HTTPError as e: # http異常處理
return "http異常"
try:
bsObj = BeautifulSoup(html.read(), 'lxml')
title = bsObj.title
except AttributeError as e: # 標(biāo)簽異常處理
return "標(biāo)簽異常"
return title
title = getTitle('http://www.toutiao.com')
if title == None:
print ("title 沒(méi)有找到")
else:
print(title)
喜歡就點(diǎn)個(gè)贊吧J右搿T核!