原代碼如下:
from bs4 import BeautifulSoup
with open('/Users/jkxuan/Desktop/1_2answer_of_homework/1_2_homework_required/index.html', 'r') as wb_data:
Soup = BeautifulSoup(wb_data, 'lxml')
#image = Soup.select('body > div:nth-of-type(2) > div > div.col-md-9 > div:nth-of-type(2) > div:nth-of-type(1) > div > img')
print (Soup)
運行時出現(xiàn)報錯:
/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/jkxuan/Desktop/1_2answer_of_homework/1.2.py
Traceback (most recent call last):
File "/Users/jkxuan/Desktop/1_2answer_of_homework/1.2.py", line 4, in <module>
Soup = BeautifulSoup(wb_data, 'lxml')
File "/usr/local/lib/python3.6/site-packages/bs4/__init__.py", line 191, in __init__
markup = markup.read()
File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 16338: ordinal not in range(128)
Process finished with exit code 1
仔細(xì)看報錯內(nèi)容未辆,ascii 是美國信息互換標(biāo)準(zhǔn)代碼'ascii' codec編碼解釋器 can't decode解釋代碼 byte字節(jié) 0xc2 in position 16338: ordinal序列 not in range范圍內(nèi)(128)
通過查詢得知,這個報錯原因是內(nèi)部代碼里面的編碼亂碼疏日,未按照ascii標(biāo)準(zhǔn)搔扁,可能是網(wǎng)頁中存在中文字符诽俯,這時候奠衔,只需要修改第二行代碼脊另,添加encoding="gb2312"
即可,下方是正確代碼:
with open('/Users/jkxuan/Desktop/1_2answer_of_homework/1_2_homework_required/index.html', 'r', encoding="gb2312") as wb_data:
好了,問題解決了鉴逞。參考網(wǎng)址