BeautifulSoup 指北_概覽

轉(zhuǎn)載須注明出處：簡(jiǎn)書(shū)@Orca_J35 | GitHub@orca-j35娩贷，所有筆記均托管于 python_notes 倉(cāng)庫(kù)景用，歡迎 star ?！

概述

?官方文檔中混雜了 Py2 和 Py3 的術(shù)語(yǔ)和代碼，本筆記針對(duì) Py3 梳理了文檔中的內(nèi)容，在了解 BeautifulSoup 的過(guò)程中蕾总，建議將本筆記與官方文檔配合食用。

Beautiful Soup 是一個(gè)用來(lái)從 HTML 或 XML 文件中提取數(shù)據(jù)的 Python 庫(kù)台诗。在使用 BeautifulSoup 時(shí)驱敲，我們選擇自己喜歡的解析器铁蹈，從而以自己熟悉的方式來(lái)導(dǎo)航、查找和修改解析樹(shù)癌佩。

相關(guān)資源:

Home: https://www.crummy.com/software/BeautifulSoup/
PyPI: https://pypi.org/project/beautifulsoup4/
Docs-EN: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Docs-CN: https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

安裝:

pip install beautifulsoup4

如果遇到安裝問(wèn)題木缝，可以參考:

Installing Beautiful Soup
Problems after installation

如果能順利執(zhí)行以下代碼，則說(shuō)明安裝成功:

from bs4 import BeautifulSoup
soup = BeautifulSoup('<p>Hello</p>', 'lxml')
print(soup.p.string) #> Hello

?在安裝庫(kù)和導(dǎo)入庫(kù)時(shí)使用的名稱(chēng)不一定相同围辙，例如: 在安裝 BeautifulSoup4 時(shí)我碟，使用的名稱(chēng)是 beautifulsoup4；在導(dǎo)入時(shí)姚建，使用的名稱(chēng)是 bs4 (路徑為 ~\Python\Lib\site-packages\bs4)矫俺。

如果在使用過(guò)程中遇到本文未涵蓋的問(wèn)題，請(qǐng)參考: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#troubleshooting

Three sisters

下面這段名為 "Three sisters" 文檔是本筆記的 HTML 示例文檔(官方文檔中也用的這段代碼):

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a  class="sister" id="link1">Elsie</a>,
<a  class="sister" id="link2">Lacie</a> and
<a  class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""

這段 HTML 文檔存在 "tag soup"掸冤，HTML 解析器會(huì)自動(dòng)修復(fù) "tag soup"

提高性能

BeautifulSoup 的速度永遠(yuǎn)會(huì)低于其使用的解析器的速度厘托。如果對(duì)速度有嚴(yán)格要求，應(yīng)直接使用 lxml 庫(kù)來(lái)解析稿湿。

對(duì) BeautifulSoup 而言铅匹，lxml 解析器的速度比 html.parser 或 html5lib 更快。

可以通過(guò)安裝 cchardet 庫(kù)來(lái)顯著提升檢測(cè)編碼方案的速度饺藤。

僅解析部分文檔并不會(huì)節(jié)省大量的解析時(shí)間包斑，但是可以節(jié)省大量?jī)?nèi)存流礁，并有效提升檢索文檔的速度。

BeautifulSoup()??

??BeautifulSoup(self, markup="", features=None, builder=None, parse_only=None, from_encoding=None, exclude_encodings=None, **kwargs)

構(gòu)造器 BeautifulSoup() 中各參數(shù)的含義如下:

markup - 要解析的標(biāo)簽(markup)罗丰，可以是字符串或 file-like 對(duì)象神帅。

from bs4 import BeautifulSoup

with open("index.html") as fp:
    soup = BeautifulSoup(fp)

soup = BeautifulSoup("<html>data</html>")

features - 用來(lái)設(shè)置解析器，可使用解析器的名稱(chēng)("lxml", "lxml-xml", "html.parser", "html5lib")萌抵，或使用標(biāo)簽的類(lèi)型("html", "html5", "xml")找御。建議明確給出需要使用的解析器，以便 BeautifulSoup 在不同的平臺(tái)和虛擬環(huán)境中提供相同的結(jié)果绍填。

默認(rèn)情況下霎桅，BeautifulSoup 會(huì)以 HTML 格式解析文檔，如果要以 XML 格式解析文檔沐兰，則需設(shè)置 features='xml'哆档。目前支持解析 XML 的解析器僅有 lxml。

如果沒(méi)有手動(dòng)設(shè)置解析器住闯，BeautifulSoup 將會(huì)在已安裝的解析器中選一個(gè)最好用的 HTML 解析器瓜浸，解析器的優(yōu)先級(jí)依次是 lxml’s HTML parser > html5lib's parser > Python’s html.parser。

如果已手動(dòng)設(shè)置某解析器比原，但是并為安裝該解析器插佛，BeautifulSoup 將忽略該設(shè)置并按照優(yōu)先級(jí)選擇一個(gè)解析器。
builder - 不需要使用的參數(shù)(A specific TreeBuilder to use instead of looking one up based on features)量窘。
parse_only - 以 SoupStrainer 對(duì)象作為實(shí)參值雇寇。在解析文檔的過(guò)程中只會(huì)考慮與 SoupStrainer 匹配的部分。當(dāng)我們只需要解析某部分文檔時(shí)非常有用蚌铜，比如由于文檔太大而無(wú)法放全部放入內(nèi)存時(shí)锨侯，便可以考慮只解析某部分文檔。
from_encoding - 一個(gè)字符串冬殃，表示被解析的文檔的編碼囚痴。如果 BeautifulSoup 在猜測(cè)文檔編碼時(shí)出現(xiàn)錯(cuò)誤，請(qǐng)傳遞此參數(shù)审葬。
exclude_encodings - 一個(gè)字符串列表深滚，表示已知的錯(cuò)誤編碼。如果你不知道文檔編碼涣觉，但你知道 BeautifulSoup 的猜測(cè)出現(xiàn)錯(cuò)誤時(shí)痴荐，請(qǐng)傳遞此參數(shù)。
**kwargs - 為了保證向后兼容官册，構(gòu)造可接受 BeautifulSoup3 中使用的某些關(guān)鍵字參數(shù)生兆，但這些關(guān)鍵字參數(shù)在 BeautifulSoup4 中并不會(huì)執(zhí)行任何操作。

解析器

Beautiful Soup 支持 Python 標(biāo)準(zhǔn)庫(kù)中的 HTML 解析器膝宁，同時(shí)還支持一些第三方的解析器(如 lxml):

Python’s html.parser - BeautifulSoup(markup,"html.parser")
lxml’s HTML parser - BeautifulSoup(markup, "lxml")
lxml’s XML parser - BeautifulSoup(markup, "lxml-xml") 或 BeautifulSoup(markup, "xml")
html5lib - BeautifulSoup(markup, "html5lib")

默認(rèn)情況下皂贩，BeautifulSoup 會(huì)以 HTML 格式解析文檔栖榨，如果要以 XML 格式解析文檔，則需設(shè)置 features='xml'明刷。目前支持解析 XML 的解析器僅有 lxml。

如果沒(méi)有手動(dòng)設(shè)置解析器满粗，BeautifulSoup 將會(huì)在已安裝的解析器中選一個(gè)最好用的 HTML 解析器辈末，解析器的優(yōu)先級(jí)依次是 lxml’s HTML parser > html5lib's parser > Python’s html.parser。

如果已手動(dòng)設(shè)置某解析器映皆，但是并為安裝該解析器挤聘，BeautifulSoup 將忽略該設(shè)置并按照優(yōu)先級(jí)選擇一個(gè)解析器。

第三方解析器的安裝方法和優(yōu)缺點(diǎn)對(duì)比: Installing a parser

建議使用 lxml 解析器來(lái)提高解析速度捅彻。早于 2.7.3 和 3.2.2 的 Python 版本组去，必須使用 lxml 和 html5lib 解析器，因?yàn)檫@些版本的內(nèi)置 HTML 解析器不夠穩(wěn)定步淹。

Note: 如果試圖解析無(wú)效的 HTML/XML 文檔从隆，不同解析器可能會(huì)給出不同的結(jié)果。

有關(guān)解析器間的具體差異缭裆，詳見(jiàn): Specifying the parser to use

解析 XML 文檔

參考: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#parsing-xml

默認(rèn)情況下键闺，BeautifulSoup 會(huì)以 HTML 格式解析文檔，如果要以 XML 格式解析文檔澈驼，則需設(shè)置 features='xml'辛燥。目前支持解析 XML 的解析器僅有 lxml。

編碼

參考: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#encodings

HTML 或 XML 文檔可能會(huì)采用不同的編碼方案(如 ASCII 或 UTF-8)缝其，當(dāng)你將文檔加載到 BeautifulSoup 后挎塌，便會(huì)自動(dòng)轉(zhuǎn)換為 Unicode。

markup = "<h1>Sacr\xc3\xa9 bleu!</h1>"
soup = BeautifulSoup(markup, 'lxml')
print(soup.h1)
#> <h1>Sacré bleu!</h1>
print(soup.h1.string)
#> u'Sacr\xe9 bleu!'

BeautifulSoup 會(huì)使用一個(gè)叫做 Unicode, Dammit 的子庫(kù)來(lái)檢測(cè)文檔編碼并將其轉(zhuǎn)換為 Unicode内边。 BeautifulSoup 對(duì)象的 .original_encoding 屬性記錄了自動(dòng)識(shí)別編碼的結(jié)果:

print(soup.original_encoding)
#> 'utf-8'

在大多數(shù)時(shí)候榴都，Unicode, Dammit 能夠猜測(cè)出正確的編碼方案，但是偶爾也會(huì)犯錯(cuò)假残。有時(shí)候即便猜測(cè)正確缭贡，但也需要先逐字節(jié)遍歷文檔后才能給出答案，這樣非常耗時(shí)辉懒。如果你知道文檔的編碼方案阳惹，則可以通過(guò) from_encoding 參數(shù)來(lái)設(shè)置編碼方案，從而避免錯(cuò)誤和延遲眶俩。

Here’s a document written in ISO-8859-8. The document is so short that Unicode, Dammit can’t get a lock on it, and misidentifies it as ISO-8859-7:
markup = b"<h1>\xed\xe5\xec\xf9</h1>"
soup = BeautifulSoup(markup)
soup.h1
<h1>νεμω</h1>
soup.original_encoding
'ISO-8859-7'
We can fix this by passing in the correct from_encoding:
soup = BeautifulSoup(markup, from_encoding="iso-8859-8")
soup.h1
<h1>????</h1>
soup.original_encoding
'iso8859-8'

如果你并不知道編碼方案莹汤，但是你知道 Unicode, Dammit 給出了錯(cuò)誤答案，則可以使用 exclude_encodings 參數(shù)來(lái)排除某些編碼方案:

soup = BeautifulSoup(markup, exclude_encodings=["ISO-8859-7"])
soup.h1
<h1>????</h1>
soup.original_encoding
'WINDOWS-1255'
Windows-1255 isn’t 100% correct, but that encoding is a compatible superset of ISO-8859-8, so it’s close enough. (exclude_encodings is a new feature in Beautiful Soup 4.4.0.)

如果需要了解更多信息颠印，請(qǐng)閱讀: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#encodings

僅解析部分文檔

參考: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#parsing-only-part-of-a-document

對(duì)于僅需要解析 <a> 標(biāo)簽情況而言纲岭，先解析整個(gè)文檔然后再查找 <a> 標(biāo)簽標(biāo)準(zhǔn)過(guò)程會(huì)浪費(fèi)大量的時(shí)間和內(nèi)存抹竹。如果一開(kāi)始就忽略掉與 <a> 標(biāo)簽無(wú)關(guān)的部分，則會(huì)有效提升查詢(xún)速度止潮。

對(duì)于僅需要解析部分文檔的情況而言窃判，可使用 SoupStrainer 類(lèi)篩選出要保留的標(biāo)簽。

?僅解析部分文檔并不會(huì)節(jié)省大量的解析時(shí)間喇闸，但是可以節(jié)省大量?jī)?nèi)存袄琳，并有效提升檢索文檔的速度。

?html5lib 解析器不支持該功能燃乍，原因如下:

If you use html5lib, the whole document will be parsed, no matter what. This is because html5lib constantly rearranges the parse tree as it works, and if some part of the document didn’t actually make it into the parse tree, it’ll crash. To avoid confusion, in the examples below I’ll be forcing Beautiful Soup to use Python’s built-in parser.

SoupStrainer??

SoupStrainer() 構(gòu)造器的參數(shù)與搜索解析樹(shù)的方法相同: name, attrs, text, **kwargs唆樊，不可將 text 寫(xiě)作 string，對(duì) SoupStrainer() 而言 text 和 string 不能等效使用刻蟹。

示例 - SoupStrainer 對(duì)象的使用方法:

from bs4 import SoupStrainer

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a  class="sister" id="link1">Elsie</a>,
<a  class="sister" id="link2">Lacie</a> and
<a  class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""
only_a_tags = SoupStrainer("a")
soup = BeautifulSoup(html_doc, "html.parser", parse_only=only_a_tags)
print([f'{type(i)}::{i.name}' for i in soup])
#> ["<class 'bs4.element.Tag'>::a", "<class 'bs4.element.Tag'>::a", "<class 'bs4.element.Tag'>::a"]


only_tags_with_id_link2 = SoupStrainer(id="link2")
soup = BeautifulSoup(
    html_doc, "html.parser", parse_only=only_tags_with_id_link2)
print([f'{type(i)}::{i}' for i in soup])
#> ['<class \'bs4.element.Tag\'>::<a class="sister"  id="link2">Lacie</a>']


def is_short_string(text: str):
    return len(text) < 10
only_short_strings = SoupStrainer(text=is_short_string)
soup = BeautifulSoup(html_doc, "html.parser", parse_only=only_short_strings)
print([repr(i) for i in soup])
#> ["'\\n'", "'\\n'", "'\\n'", "'\\n'", "'Elsie'", "',\\n'", "'Lacie'", "' and\\n'", "'Tillie'", "'\\n'", "'...'", "'\\n'"]

SoupStrainer 可用作搜索解析樹(shù)的方法的參數(shù)逗旁，這可能并不常見(jiàn)，但還是提一下:

def is_short_string(text: str):
    return len(text) < 10


only_short_strings = SoupStrainer(text=is_short_string)
soup = BeautifulSoup(
    html_doc,
    "html.parser",
)
print([repr(i) for i in soup.find_all(only_short_strings)])
#> "'\\n'", "'\\n'", "'\\n'", "'\\n'", "'Elsie'", "',\\n'", "'Lacie'", "' and\\n'", "'Tillie'", "'\\n'", "'...'", "'\\n'"]

對(duì)象的種類(lèi)

參考: Kinds of objects

BeautifulSoup 會(huì)將復(fù)雜的 HTML 文檔轉(zhuǎn)換為復(fù)雜的 Python 對(duì)象樹(shù)舆瘪，樹(shù)中的每個(gè)節(jié)點(diǎn)都是一個(gè) Python 對(duì)象片效，共有四種需要處理對(duì)象: Tag, NavigableString, BeautifulSoup, Comment

Tag ??

Tag 對(duì)象對(duì)應(yīng)于原始文檔中的 XML 或 HTML 標(biāo)記(tag)。

from bs4 import BeautifulSoup
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'lxml')
tag = soup.b
print(type(tag))
# <class 'bs4.element.Tag'>

Tag 對(duì)象擁有很多屬性和方法介陶，在 Navigating the tree 和 Searching the tree 中有詳細(xì)解釋堤舒。本小節(jié)僅介紹 Tag 對(duì)象兩個(gè)最重要的特性。

name

每個(gè) Tag 對(duì)象都有自己的名字哺呜，通過(guò) .name 字段訪問(wèn):

from bs4 import BeautifulSoup
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'lxml')
tag = soup.b
print(tag.name)
#> b

如果修改了 Tag 對(duì)象的 .name 字段舌缤，則會(huì)影響 BeautifulSoup 對(duì)象生成的 HTML 文檔:

from bs4 import BeautifulSoup
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'lxml')
tag = soup.b
tag.name = "blockquote"
print(tag)
#> <blockquote class="boldest">Extremely bold</blockquote>
print(soup)
#> <html><body><blockquote class="boldest">Extremely bold</blockquote></body></html>

Attributes

一個(gè) HTML 標(biāo)簽可包含任意數(shù)量的屬性(attributes)。例如某残，標(biāo)簽 <b id="boldest"> 包含名為 "id" 的屬性国撵，其值為 "boldest"。

可將 Tag 對(duì)象視作存放標(biāo)簽屬性的字典玻墅，鍵值對(duì)由屬性名和屬性值構(gòu)成介牙，使用方法也與字典相同。另外澳厢，還可通過(guò) .attrs 字段來(lái)獲取存放標(biāo)簽屬性的字典环础。

from bs4 import BeautifulSoup
soup = BeautifulSoup('<b id="boldest">Extremely bold</b>', 'lxml')
tag = soup.b
print(tag['id'])  #> boldest
print(tag.get('id'))  #> boldest
print(tag.attrs) #> {'id': 'boldest'}

Tag 對(duì)象支持對(duì)標(biāo)簽的屬性進(jìn)行添加、刪除剩拢、修改:

from bs4 import BeautifulSoup
soup = BeautifulSoup('<b id="boldest">Extremely bold</b>', 'lxml')
tag = soup.b
tag['id'] = 'verybold'
tag['another-attribute'] = 1
print(tag)
#> <b another-attribute="1" id="verybold">Extremely bold</b>
del tag['id']
del tag['another-attribute']
print(tag)
#> <b>Extremely bold</b>
print(tag.get('id', "Don't have"))
#> Don't have
print(tag['id']
#> KeyError: 'id'

.has_attr() 方法用于判斷 Tag 對(duì)象是否包含某個(gè)屬性:

from bs4 import BeautifulSoup
soup = BeautifulSoup('<b id="boldest">Extremely bold</b>', 'lxml')
print(soup.b.has_attr('id'))
#> True
print(soup.b.has_attr('class'))
#> False

Multi-valued attributes

HTML 4 中某些屬性可以具備多個(gè)值线得，HTML 5 在 HTML 4 的基礎(chǔ)上刪除了一些多值屬性，但又引入了一些多值屬性徐伐。最常見(jiàn)的多值屬性是 class (HTML 標(biāo)簽可持有多個(gè) CSS 類(lèi))贯钩，其它一些多值屬性的例子: rel, rev, accept-charset, headers, accesskey。

BeautifulSoup 將多值屬性的值表示為一個(gè)列表：

from bs4 import BeautifulSoup
css_soup = BeautifulSoup('<p class="body"></p>', 'lxml')
print(css_soup.p['class'])
#> ["body"]

css_soup = BeautifulSoup('<p class="body strikeout"></p>', 'lxml')
print(css_soup.p['class'])
#> ["body", "strikeout"]

如果某個(gè)屬性看起來(lái)好像有多個(gè)值，但在任何版本的 HTML 定義中都沒(méi)有被定義為多值屬性角雷，那么 BeautifulSoup 會(huì)將這個(gè)屬性作為字符組返回:

id_soup = BeautifulSoup('<p id="my id"></p>', 'lxml')
print(id_soup.p['id'])
#> my id

將 Tag 轉(zhuǎn)換成字符串時(shí)祸穷，會(huì)對(duì)多個(gè)屬性值進(jìn)行合并:

print(rel_soup.a['rel'])
# ['index']
rel_soup.a['rel'] = ['index', 'contents']
print(rel_soup.p)
# <p>Back to the <a rel="index contents">homepage</a></p>

``.get_attribute_list()` 方法用于獲取標(biāo)簽屬性列表，無(wú)論屬性是否是多值屬性都會(huì)返回一個(gè)列表:

id_soup = BeautifulSoup('<p class="body strikeout" id="my id"></p>', 'lxml')
print(id_soup.p['class'])
#> ['body', 'strikeout']
print(id_soup.p.get_attribute_list('class'))
#> ['body', 'strikeout']
print(id_soup.p['id'])
#> my id
print(id_soup.p.get_attribute_list('id'))
#> ['my id']

如果文檔以 XML 格式進(jìn)行解析勺三，則不會(huì)包含多值屬性:

xml_soup = BeautifulSoup('<p class="body strikeout"></p>', 'xml')
print(xml_soup.p['class'])
#> body strikeout

NavigableString ??

?? bs4.element.NavigableString

NavigableString 繼承自 str 類(lèi)和 PageElement 類(lèi)雷滚，不能對(duì) NavigableString 對(duì)象所含字符串進(jìn)行編輯，但是可以使用 replace_with() 方法進(jìn)行替換:

from bs4 import BeautifulSoup
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'lxml')
tag = soup.b
tag.string.replace_with("No longer bold")
print(tag)
#> <b class="boldest">No longer bold</b>

NavigableString 支持 Navigating the tree 和 Searching the tree 中描述大部分功能檩咱，但并非全部功能揭措。由于 NavigableString 對(duì)象只能包含字符串，不能包含其它內(nèi)容(Tag 對(duì)象可以包含字符串或子 tag)刻蚯，所以 NavigableString 不支持 .contents 或 .string 字段，也不支持 find() 方法桑嘶。在 NavigableString 上調(diào)用 name 字段時(shí)炊汹，會(huì)返回 None

如果想要在 BeautifulSoup 外部使用 NavigableString 中的字符串，你應(yīng)該先調(diào)用 str() 把 NavigableString 對(duì)象轉(zhuǎn)換為普通的字符串對(duì)象逃顶。如果不將其轉(zhuǎn)換為普通字符串的話(huà)讨便，你將始終持有對(duì)整個(gè) BeautifulSoup 解析樹(shù)的引用，這會(huì)浪費(fèi)大量?jī)?nèi)存以政。

可通過(guò) .string 對(duì)象獲取 NavigableString 對(duì)象霸褒，詳見(jiàn) .string?? 小節(jié)

BeautifulSoup ??

BeautifulSoup 對(duì)象表示整個(gè)文檔，在大部分時(shí)候盈蛮，你可以將其視為 Tag 對(duì)象废菱。BeautifulSoup 對(duì)象支持 Navigating the tree 和 Searching the tree 中描述大部分功。

由于并沒(méi)有與 BeautifulSoup 對(duì)象對(duì)應(yīng)的 HTML/XML tag抖誉，因此 BeautifulSoup 對(duì)象的 name 字段為 '[document]'殊轴，并且不包含 HTML attributes。

from bs4 import BeautifulSoup
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'lxml')
print(type(soup))
#> <class 'bs4.BeautifulSoup'>
print(soup.name)
#> [document]

注釋及特殊字符串

Tag, NavigableString, BeautifulSoup 幾乎涵蓋了你在 HTML 或 XML 文件中看到的所有內(nèi)容袒炉，但是仍有一些沒(méi)有覆蓋到的內(nèi)容旁理，比如注釋(comment):

from bs4 import BeautifulSoup
markup = "<b><!--Hey, buddy. Want to buy a used parser?--></b>"
soup = BeautifulSoup(markup, 'lxml')
comment = soup.b.string
print(type(comment))
#> <class 'bs4.element.Comment'>
print(comment)
#> Hey, buddy. Want to buy a used parser?

Comment 類(lèi)繼承自 PreformattedString，PreformattedString 繼承自 NavigableString我磁。也就是說(shuō) Comment 是一種特殊的 NavigableString 類(lèi)型孽文。

但是當(dāng)注釋出現(xiàn)在HTML文檔中時(shí)，Comment 對(duì)象會(huì)使用特殊的格式輸出:

from bs4 import BeautifulSoup
markup = "<b><!--Hey, buddy. Want to buy a used parser?--></b>"
soup = BeautifulSoup(markup, 'lxml')
print(soup.b.prettify())
'''Out:
<b>
 <!--Hey, buddy. Want to buy a used parser?-->
</b>
'''

BeautifulSoup 還為 XML 文檔中可能會(huì)出現(xiàn)的其它內(nèi)容定義了各種類(lèi):

CData
ProcessingInstruction
Declaration
Doctype

與 Comment 類(lèi)似夺艰，這些類(lèi)都是 NavigableString 的子類(lèi)芋哭，并進(jìn)行了一些擴(kuò)展。下面這個(gè)示例中劲适，將使用 CDATA block 來(lái)替換 Comment:

from bs4 import BeautifulSoup
from bs4 import CData
markup = "<b><!--Hey, buddy. Want to buy a used parser?--></b>"
soup = BeautifulSoup(markup, 'lxml')
cdata = CData("A CDATA block")
comment = soup.b.string
comment.replace_with(cdata)
print(soup.b.prettify())
'''Out:
<b>
 <![CDATA[A CDATA block]]>
</b>
'''

對(duì)象的是否相等

參考: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#copying-beautiful-soup-objects

Beautiful Soup says that two NavigableString or Tag objects are equal when they represent the same HTML or XML markup. In this example, the two <b> tags are treated as equal, even though they live in different parts of the object tree, because they both look like “<b>pizza</b>”:

markup = "<p>I want <b>pizza</b> and more <b>pizza</b>!</p>"
soup = BeautifulSoup(markup, 'html.parser')
first_b, second_b = soup.find_all('b')
print first_b == second_b
# True

print first_b.previous_element == second_b.previous_element
# False

If you want to see whether two variables refer to exactly the same object, use is:

print first_b is second_b
# False

拷貝 BeautifulSoup 對(duì)象

參考: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#copying-beautiful-soup-objects

You can use copy.copy() to create a copy of any Tag or NavigableString:

import copy
p_copy = copy.copy(soup.p)
print p_copy
# <p>I want <b>pizza</b> and more <b>pizza</b>!</p>

The copy is considered equal to the original, since it represents the same markup as the original, but it’s not the same object:

print soup.p == p_copy
# True

print soup.p is p_copy
# False

The only real difference is that the copy is completely detached from the original Beautiful Soup object tree, just as if extract() had been called on it:

print p_copy.parent
# None

This is because two different Tag objects can’t occupy the same space at the same time.

輸出

擴(kuò)展閱讀: Output encoding

BeautifulSoup 兼容 Py2 和 Py3 楷掉，但 Py2 和 Py3 中的 str 對(duì)象并不相同，這會(huì)導(dǎo)出輸出結(jié)果存在差異，在獲取輸出時(shí)需注意區(qū)分烹植。

.decode()??

該方法會(huì)將 BeautifulSoup 對(duì)象和 Tag 對(duì)象中的內(nèi)容轉(zhuǎn)換為 Unicode 字符串斑鸦。

源代碼中的注釋如下:

def decode(self, indent_level=None,
           eventual_encoding=DEFAULT_OUTPUT_ENCODING,
           formatter="minimal"):
    """Returns a Unicode representation of this tag and its contents.

        :param eventual_encoding: The tag is destined to be
           encoded into this encoding. This method is _not_
           responsible for performing that encoding. This information
           is passed in so that it can be substituted in if the
           document contains a <META> tag that mentions the document's
           encoding.
        """

對(duì) Py3 而言，decode() 將返回 str 對(duì)象(Uncode 字符串):

# in Python3
from bs4 import BeautifulSoup
markup = '<a 
soup = BeautifulSoup(markup, 'lxml')
print(type(soup.decode()))
#> <class 'str'>
print(soup.decode())
#> <html><body><a >連接到<i>example.com</i></a></body></html>

對(duì) Py2 而言草雕，decode() 將返回 Unicode 對(duì)象(Uncode 字符串):

# in Python2
>>> markup = u'<a 
>>> soup = BeautifulSoup(markup, 'lxml')
>>> print(type(soup.decode()))
<type 'unicode'>
>>> print(soup.decode())
<html><body><a >連接到<i>example.com</i></a></body></html>

.encode()??

該方法會(huì)先將數(shù)據(jù)結(jié)構(gòu)轉(zhuǎn)換為 Unicode 字符串巷屿，再按照指定編碼對(duì) Unicode 字符串進(jìn)行編碼，默認(rèn)采用 UTF-8 編碼墩虹。源代碼如下:

def encode(self, encoding=DEFAULT_OUTPUT_ENCODING,
           indent_level=None, formatter="minimal",
           errors="xmlcharrefreplace"):
    # Turn the data structure into Unicode, then encode the
    # Unicode.
    u = self.decode(indent_level, encoding, formatter)
    return u.encode(encoding, errors)

對(duì) Py3 而言嘱巾，encode() 將返回以 encoding 編碼(默認(rèn)采用 UTF-8)的 bytes 對(duì)象:

# in Python3
from bs4 import BeautifulSoup
markup = '<a 
soup = BeautifulSoup(markup, 'lxml')
print(type(soup.encode()))
#> <class 'bytes'>
print(soup.encode())
#> b'<html><body><a

對(duì) Py2 而言，encode() 將返回以 encoding 編碼(默認(rèn)采用 UTF-8)的 str 對(duì)象(Py2 和 Py3 中的 str 對(duì)象并不相同):

# in Python2
>>> markup = u'<a 
>>> soup = BeautifulSoup(markup, 'lxml')
>>> print(soup.encode())
<html><body><a >連接到<i>example.com</i></a></body></html>
>>> soup.encode()
'<html><body><a 
>>> type(soup.encode())
<type 'str'>

.prettify()??

參考: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#pretty-printing

??prettify(self, encoding=None, formatter="minimal")

當(dāng) encoding==None 時(shí)诫钓，prettify() 會(huì)將 BeautifulSoup 解析樹(shù)轉(zhuǎn)換為格式良好的 Unicode 字符串旬昭，在字符串中每個(gè) HTML/XML tag 和字符串都會(huì)獨(dú)占一行；當(dāng) encoding!=None 時(shí)菌湃，prettify() 會(huì)將 BeautifulSoup 解析樹(shù)編碼為格式良好的 bytes 字符串问拘。

prettify() 的源代碼如下:

# prettify()的源代碼
def prettify(self, encoding=None, formatter="minimal"):
    if encoding is None:
        return self.decode(True, formatter=formatter)
    else:
        return self.encode(encoding, True, formatter=formatter)

示例 - in Py3:

# in Python3
from bs4 import BeautifulSoup
markup = '<a >I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup, 'lxml')
print(type(soup.prettify()))
#> <class 'str'>
print(soup.prettify())
'''Out:
<html>
 <body>
  <a >
   I linked to
   <i>
    example.com
   </i>
  </a>
 </body>
</html>
'''

prettify() 適用于 BeautifulSoup 對(duì)象和 Tag 對(duì)象:

print(soup.a.prettify())
'''Out:
<a >
 I linked to
 <i>
  example.com
 </i>
</a>
'''

示例 - in Py2:

# in Python2
from bs4 import BeautifulSoup
markup = u'<a 
soup = BeautifulSoup(markup, 'lxml')
print(soup.prettify())
'''Out:
<html>
 <body>
  <a >
   I linked to
   <i>
    example.com
   </i>
  </a>
 </body>
</html>
'''

formatter 參數(shù)

參考: Output formatters

如果傳遞給 BeautifulSoup() 的文檔中包含 HTML 實(shí)體(entities)，那么在輸出文檔時(shí)惧所，這些 HTML 實(shí)體將被轉(zhuǎn)換為 Unicode 字符:

# in Python3
from bs4 import BeautifulSoup
soup = BeautifulSoup("&ldquo;Dammit!&rdquo; he said.", 'lxml')
print(soup)
#> <html><body><p>“Dammit!” he said.</p></body></html>

如果將文檔編碼為 bytes 對(duì)象骤坐，encode() 方法會(huì)先將 HTML 文檔內(nèi)容轉(zhuǎn)換為 Unicode 字符串(此時(shí) HTML 實(shí)體將被轉(zhuǎn)換為 Unicode 字符)，然后再將 Unicode 字符串編碼為 bytes 對(duì)象下愈，默認(rèn)采用 UTF-8 編碼纽绍。HTML 實(shí)體將以 Unicode 字符的形式編碼。

# in Python3
# 注意觀察HTML實(shí)體的變化
from bs4 import BeautifulSoup
soup = BeautifulSoup("&ldquo;Dammit!&rdquo; he said.", 'lxml')
print(soup.encode())
#> b'<html><body><p>\xe2\x80\x9cDammit!\xe2\x80\x9d he said.</p></body></html>'

print('“'.encode('utf-8'))
#> b'\xe2\x80\x9c'

默認(rèn)情況下势似，在輸出的 Unicode 字符串中拌夏，為了保證 BeautifulSoup 不會(huì)在無(wú)意中生成無(wú)效的 HTML 或 XML，獨(dú)立的 &(ampersand)和尖括號(hào)會(huì)以 HTML 實(shí)體顯示:

# 獨(dú)立的&會(huì)顯示為&amp;   &amp;會(huì)保持原樣
# 獨(dú)立的<會(huì)顯示為&lt;    &lt;會(huì)保持原樣
# 獨(dú)立的>會(huì)顯示為&gt;    &gt;會(huì)保持原樣

# in Python3
from bs4 import BeautifulSoup
soup = BeautifulSoup(
    "<p>The law firm of Dewey, Cheatem, > &gt; < &lt; & &amp; Howe</p>",
    'lxml')
p = soup.p
print(p)
#> <p>The law firm of Dewey, Cheatem, &gt; &gt; &lt; &lt; &amp; &amp; Howe</p>
soup = BeautifulSoup(
    '<a >A link</a>', 'lxml')
print(soup.a)
#> <a >A link</a>

如果需要改變 HTML 實(shí)體的呈現(xiàn)方式叫编，便需要向 prettify() , encode() , decode() 傳遞 formatter 參數(shù)辖佣。formatter 的實(shí)參值有 6 種情況，默認(rèn)為 formatter="minimal"搓逾。另外卷谈，__str__() , __unicode__() , __repr__() 在輸出時(shí)只能采用默認(rèn)行為，不可修改霞篡。

minimal

當(dāng) formatter="minimal" 時(shí)世蔗，會(huì)按照前面敘述的規(guī)則來(lái)處理字符串，以確保生成有效的 HTML/XML:

# in Python3
from bs4 import BeautifulSoup
french = "<p>Il a dit &lt;&lt;Sacr&eacute; bleu!&gt;&gt;</p>"
soup = BeautifulSoup(french, 'lxml')
print(soup.prettify(formatter="minimal"))
'''Out:
<html>
 <body>
  <p>
   Il a dit &lt;&lt;Sacré bleu!&gt;&gt;
  </p>
 </body>
</html>'''

html

當(dāng) formatter="html" 時(shí)朗兵，BeautifulSoup 會(huì)盡可能的將 Unicode 字符傳喚為 HTML 實(shí)體:

# in Python3
from bs4 import BeautifulSoup
french = "<p>Il a dit &lt;&lt;Sacr&eacute; bleu!&gt;&gt; é</p>"
soup = BeautifulSoup(french, 'lxml')
print(soup.prettify(formatter="html"))
'''Out:
<html>
 <body>
  <p>
   Il a dit &lt;&lt;Sacr&eacute; bleu!&gt;&gt; &eacute;
  </p>
 </body>
</html>'''

# If you pass in ``formatter="html5"``, it's the same as

html5

當(dāng) formatter="html5" 時(shí)污淋，BeautifulSoup 會(huì)省略 HTML 空 tag 的結(jié)束斜杠，例如:

# in Python3
from bs4 import BeautifulSoup
soup = BeautifulSoup("<br>", 'lxml')
print(soup.encode(formatter="html"))
# <html><body><br/></body></html>
print(soup.encode(formatter="html5"))
# <html><body><br></body></html>

None

當(dāng) formatter=None 時(shí)余掖，BeautifulSoup 將不會(huì)在輸出中修改字符串寸爆。此時(shí)的輸出速度最快，但可能會(huì)導(dǎo)致 BeautifulSoup 生成無(wú)效的 HTML/XML，例如:

# in Python3
from bs4 import BeautifulSoup
french = "<p>Il a dit &lt;&lt;Sacr&eacute; bleu!&gt;&gt;</p>"
soup = BeautifulSoup(french, 'lxml')
print(soup.prettify(formatter=None))
'''Out:
<html>
 <body>
  <p>
   Il a dit <<Sacré bleu!>>
  </p>
 </body>
</html>
'''

link_soup = BeautifulSoup('<a >A link</a>')
print(link_soup.a.encode(formatter=None))
# <a >A link</a>

函數(shù)

還可以向 formatter 傳遞一個(gè)函數(shù)赁豆，BeautifulSoup 會(huì)為文檔中的每個(gè)"字符串"和"屬性值"調(diào)用一次該函數(shù)仅醇。你可以在這個(gè)函數(shù)中做任何你想做的事情。下面這個(gè) formatter 函數(shù)會(huì)將字符串和屬性值轉(zhuǎn)換為大寫(xiě)魔种，并不會(huì)執(zhí)行其它操作:

# in Python3
from bs4 import BeautifulSoup

def uppercase(str):
    return str.upper()

french = "<p>Il a dit &lt;&lt;Sacr&eacute; bleu!&gt;&gt;</p>"
soup = BeautifulSoup(french, 'lxml')
print(soup.prettify(formatter=uppercase))
'''Out:
<html>
 <body>
  <p>
   IL A DIT <<SACRé BLEU!>>
  </p>
 </body>
</html>'''

link_soup = BeautifulSoup(
    '<a >A link</a>', 'lxml')
print(link_soup.a.prettify(formatter=uppercase))
'''Out:
<a >
 A LINK
</a>
'''

如果你正在編寫(xiě) formatter 函數(shù)析二，你應(yīng)該先了解一下 bs4.dammit 模塊中的 EntitySubstitution 類(lèi)——該類(lèi)將 BeautifulSoup 中的標(biāo)準(zhǔn) formatter 實(shí)現(xiàn)為類(lèi)方法:

'html' formatter 對(duì)應(yīng)于 EntitySubstitution.substitute_html
'minimal' formatter 對(duì)應(yīng)于 EntitySubstitution.substitute_xml

你可以使用上述函數(shù)來(lái)模擬 formatter=html 或 formatter==minimal，并添加一些你需要的擴(kuò)展功能节预。

下面這個(gè)示例會(huì)盡可能的將 Unicode 字符傳喚為 HTML 實(shí)體叶摄，并將所有字符串轉(zhuǎn)換為大寫(xiě):

from bs4 import BeautifulSoup
from bs4.dammit import EntitySubstitution

def uppercase_and_substitute_html_entities(str):
    return EntitySubstitution.substitute_html(str.upper())

french = "<p>Il a dit &lt;&lt;Sacr&eacute; bleu!&gt;&gt; é</p>"
soup = BeautifulSoup(french, 'lxml')
print(soup.prettify(formatter=uppercase_and_substitute_html_entities))
'''Out:
<html>
 <body>
  <p>
   IL A DIT &lt;&lt;SACR&Eacute; BLEU!&gt;&gt; &Eacute;
  </p>
 </body>
</html>
'''

CData 對(duì)象

如果創(chuàng)建創(chuàng)建了一個(gè) CData 對(duì)象，則該對(duì)象內(nèi)的文本將始終與其顯示完全一致安拟，并不會(huì)進(jìn)行格式化操作蛤吓。

Beautiful Soup will call the formatter method, just in case you’ve written a custom method that counts all the strings in the document or something, but it will ignore the return value:

from bs4.element import CData
soup = BeautifulSoup("<a></a>")
soup.a.string = CData("one < three")
print(soup.a.prettify(formatter="xml")) # ?"xml"是什么意思?
# <a>
#  <![CDATA[one < three]]>
# </a>

Non-pretty printing

如果只想得到結(jié)果字符串，并且不在意輸出格式糠赦，則可以在 BeautifulSoup 對(duì)象和 Tag 對(duì)象上調(diào)用以下方法:

__unicode__() - 對(duì)應(yīng)內(nèi)置函數(shù) unicode()柱衔，適用于 Py2
__str__() - 對(duì)應(yīng)內(nèi)置函數(shù) str()，由于 Py2 中的 str 對(duì)象不是 Unicode 字符串愉棱，所以 str() 在 Py2 和 Py3 中的輸出并不相同
__repr__() - 對(duì)應(yīng)于內(nèi)置函數(shù) repr()，由于 Py2 中的 str 對(duì)象不是 Unicode 字符串哲戚，所以 repr() 在 Py2 和 Py3 中的輸出并不相同

這三個(gè)方法的源代碼如下:

def __repr__(self, encoding="unicode-escape"):
    """Renders this tag as a string."""
    if PY3K:
        # "The return value must be a string object", i.e. Unicode
        return self.decode()
    else:
        # "The return value must be a string object", i.e. a bytestring.
        # By convention, the return value of __repr__ should also be
        # an ASCII string.
        return self.encode(encoding)

def __unicode__(self):
    return self.decode()

def __str__(self):
    if PY3K:
        return self.decode()
    else:
        return self.encode()

if PY3K:
    __str__ = __repr__ = __unicode__

對(duì) Py3 而言奔滑，上述三個(gè)方法完全等效，均返回 str 對(duì)象(Unicode 字符串):

# in Python3
markup = '<a >I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup, 'lxml')
print(soup) # 調(diào)用__str__方法
#> <html><body><a >I linked to <i>example.com</i></a></body></html>

對(duì) Py2 而言顺少，str() 將返回以 UTF-8 編碼的 str 對(duì)象(如果需要了解與編碼相關(guān)的內(nèi)容朋其，可以參考 Encodings )

# in Python2
>>> markup = u'<a >I linked to 示例<i>example.com</i></a>'
>>> soup = BeautifulSoup(markup, 'lxml')
>>> str(soup)
'<html><body><a >I linked to \xe7\xa4\xba\xe4\xbe\x8b<i>example.com</i></a></body></html>'

對(duì) Py2 而言，repr() 將返回以 unicode-escape 編碼(詳見(jiàn) Text Encodings)的 str 對(duì)象:

# in Python2
>>> markup = u'<a >I linked to 示例<i>example.com</i></a>'
>>> soup = BeautifulSoup(markup, 'lxml')
>>> repr(soup) # 以ASCII編碼,并將Unicode字面值表示為quote形式
'<html><body><a >I linked to \\u793a\\u4f8b<i>example.com</i></a></body></html>'

.get_text()??

??get_text(self, separator="", strip=False, types=(NavigableString, CData))

如果只需要獲取文檔或 tag 的文本部分脆炎，則可以使用 get_text() 方法梅猿，源代碼如下:

def get_text(self, separator="", strip=False,
             types=(NavigableString, CData)):
    """
        Get all child strings, concatenated using the given separator.
        """
    return separator.join([s for s in self._all_strings(
        strip, types=types)])

該方法會(huì)將文檔或 tag 中的所有文本合并為一個(gè) Unicode 字符串，并返回該字符串:

from bs4 import BeautifulSoup
markup = '<a >\nI linked to <i>example.com</i>\n</a>'
soup = BeautifulSoup(markup, 'lxml')

print(soup.get_text())
print(soup.i.get_text())

輸出:


I linked to example.com

example.com

separator 參數(shù)用于設(shè)置分隔符:

from bs4 import BeautifulSoup
markup = '<a >\nI linked to <i>example.com</i>\n</a>'
soup = BeautifulSoup(markup, 'lxml')

print(repr(soup.get_text('|')))
#> '\nI linked to |example.com|\n'

strip 參數(shù)用于設(shè)置是否剝離每段文本開(kāi)頭和結(jié)尾處的空白符(whitespace)：

from bs4 import BeautifulSoup
markup = '<a >\nI linked to <i>example.com</i>\n</a>'
soup = BeautifulSoup(markup, 'lxml')

print(repr(soup.get_text('|', strip=True)))
#> 'I linked to|example.com'

如果需要自己處理文本秒裕，則可以使用 .stripped_strings 生成器袱蚓，它會(huì)為我們逐一提取每段文本:

from bs4 import BeautifulSoup
markup = '<a >\nI linked to <i>example.com</i>\n</a>'
soup = BeautifulSoup(markup, 'lxml')

print([text for text in soup.stripped_strings])
#> ['I linked to', 'example.com']

.text

text 字段的源代碼如下:

text = property(get_text)

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個(gè)濱河市几蜻，隨后出現(xiàn)的幾起案子喇潘，更是在濱河造成了極大的恐慌，老刑警劉巖梭稚，帶你破解...
沈念sama閱讀 216,372評(píng)論 6贊 498
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件颖低，死亡現(xiàn)場(chǎng)離奇詭異，居然都是意外死亡弧烤，警方通過(guò)查閱死者的電腦和手機(jī)忱屑，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,368評(píng)論 3贊 392
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門(mén)，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)，“玉大人莺戒，你說(shuō)我怎么就攤上這事伴嗡。” “怎么了脏毯？”我有些...
開(kāi)封第一講書(shū)人閱讀 162,415評(píng)論 0贊 353
道士緝兇錄：失蹤的賣(mài)姜人
文/不壞的土叔我叫張陵闹究，是天一觀的道長(zhǎng)。經(jīng)常有香客問(wèn)我食店，道長(zhǎng)渣淤，這世上最難降的妖魔是什么？我笑而不...
開(kāi)封第一講書(shū)人閱讀 58,157評(píng)論 1贊 292
?港島之戀（遺憾婚禮）
正文為了忘掉前任吉嫩，我火速辦了婚禮价认，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘自娩。我一直安慰自己用踩，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 67,171評(píng)論 6贊 388
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開(kāi)白布忙迁。她就那樣靜靜地躺著脐彩，像睡著了一般。火紅的嫁衣襯著肌膚如雪姊扔。梳的紋絲不亂的頭發(fā)上惠奸，一...
開(kāi)封第一講書(shū)人閱讀 51,125評(píng)論 1贊 297
城市分裂傳說(shuō)
那天，我揣著相機(jī)與錄音恰梢，去河邊找鬼佛南。笑死，一個(gè)胖子當(dāng)著我的面吹牛嵌言，可吹牛的內(nèi)容都是我干的嗅回。我是一名探鬼主播，決...
沈念sama閱讀 40,028評(píng)論 3贊 417
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開(kāi)眼摧茴，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼绵载！你這毒婦竟也來(lái)了？” 一聲冷哼從身側(cè)響起蓬蝶，我...
開(kāi)封第一講書(shū)人閱讀 38,887評(píng)論 0贊 274
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤尘分，失蹤者是張志新（化名）和其女友劉穎，沒(méi)想到半個(gè)月后丸氛，有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體培愁，經(jīng)...
沈念sama閱讀 45,310評(píng)論 1贊 310
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,533評(píng)論 2贊 332
?白月光啟示錄
正文我和宋清朗相戀三年缓窜，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了定续。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片谍咆。...
茶點(diǎn)故事閱讀 39,690評(píng)論 1贊 348
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡，死狀恐怖私股，靈堂內(nèi)的尸體忽然破棺而出摹察，到底是詐尸還是另有隱情，我是刑警寧澤倡鲸，帶...
沈念sama閱讀 35,411評(píng)論 5贊 343
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布供嚎，位于F島的核電站，受9級(jí)特大地震影響峭状，放射性物質(zhì)發(fā)生泄漏克滴。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,004評(píng)論 3贊 325
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一优床、第九天我趴在偏房一處隱蔽的房頂上張望劝赔。院中可真熱鬧，春花似錦胆敞、人聲如沸着帽。這莊子的主人今日做“春日...
開(kāi)封第一講書(shū)人閱讀 31,659評(píng)論 0贊 22
一樁弒父案移层，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)仍翰。三九已至，卻和暖如春观话，著一層夾襖步出監(jiān)牢的瞬間歉备，已是汗流浹背。一陣腳步聲響...
開(kāi)封第一講書(shū)人閱讀 32,812評(píng)論 1贊 268
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工匪燕，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留，地道東北人喧笔。一個(gè)月前我還...
沈念sama閱讀 47,693評(píng)論 2贊 368
代替公主和親
正文我出身青樓帽驯，卻偏偏與公主長(zhǎng)得像，于是被迫代替她去往敵國(guó)和親书闸。傳聞我的和親對(duì)象是個(gè)殘疾皇子尼变，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,577評(píng)論 2贊 353