1.什么是 Elasticsearch
想查數(shù)據(jù)就免不了搜索弄屡,搜索就離不開搜索引擎蓬豁,百度仔夺、谷歌都是一個(gè)非常龐大復(fù)雜的搜索引擎慈鸠,他們幾乎索引了互聯(lián)網(wǎng)上開放的所有網(wǎng)頁和數(shù)據(jù)。然而對(duì)于我們自己的業(yè)務(wù)數(shù)據(jù)來說袭厂,肯定就沒必要用這么復(fù)雜的技術(shù)了桂对,如果我們想實(shí)現(xiàn)自己的搜索引擎阶捆,方便存儲(chǔ)和檢索拓春,Elasticsearch 就是不二選擇,它是一個(gè)全文搜索引擎亚隅,可以快速地儲(chǔ)存硼莽、搜索和分析海量數(shù)據(jù)。
2.為什么要用 Elasticsearch
Elasticsearch 是一個(gè)開源的搜索引擎煮纵,建立在一個(gè)全文搜索引擎庫 Apache Lucene? 基礎(chǔ)之上懂鸵。
那 Lucene 又是什么?Lucene 可能是目前存在的行疏,不論開源還是私有的匆光,擁有最先進(jìn),高性能和全功能搜索引擎功能的庫酿联,但也僅僅只是一個(gè)庫终息。要用上 Lucene,我們需要編寫 Java 并引用 Lucene 包才可以贞让,而且我們需要對(duì)信息檢索有一定程度的理解才能明白 Lucene 是怎么工作的周崭,反正用起來沒那么簡(jiǎn)單。
那么為了解決這個(gè)問題喳张,Elasticsearch 就誕生了续镇。Elasticsearch 也是使用 Java 編寫的,它的內(nèi)部使用 Lucene 做索引與搜索销部,但是它的目標(biāo)是使全文檢索變得簡(jiǎn)單摸航,相當(dāng)于 Lucene 的一層封裝制跟,它提供了一套簡(jiǎn)單一致的 RESTful API 來幫助我們實(shí)現(xiàn)存儲(chǔ)和檢索。
所以 Elasticsearch 僅僅就是一個(gè)簡(jiǎn)易版的 Lucene 封裝嗎酱虎?那就大錯(cuò)特錯(cuò)了雨膨,Elasticsearch 不僅僅是 Lucene,并且也不僅僅只是一個(gè)全文搜索引擎逢净。 它可以被下面這樣準(zhǔn)確的形容:
一個(gè)分布式的實(shí)時(shí)文檔存儲(chǔ)哥放,每個(gè)字段可以被索引與搜索 一個(gè)分布式實(shí)時(shí)分析搜索引擎 能勝任上百個(gè)服務(wù)節(jié)點(diǎn)的擴(kuò)展,并支持 PB 級(jí)別的結(jié)構(gòu)化或者非結(jié)構(gòu)化數(shù)據(jù) 總之爹土,是一個(gè)相當(dāng)牛逼的搜索引擎甥雕,維基百科、Stack Overflow胀茵、GitHub 都紛紛采用它來做搜索社露。
Elasticsearch 的安裝
我們可以到 Elasticsearch 的官方網(wǎng)站下載 Elasticsearch:https://www.elastic.co/downloads/elasticsearch,同時(shí)官網(wǎng)也附有安裝說明琼娘。
首先把安裝包下載下來并解壓峭弟,然后運(yùn)行 bin/elasticsearch(Mac 或 Linux)或者 bin\elasticsearch.bat (Windows) 即可啟動(dòng) Elasticsearch 了。
我使用的是 Mac脱拼,Mac 下個(gè)人推薦使用 Homebrew 安裝:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n14" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">brew install elasticsearch</pre>
Elasticsearch 默認(rèn)會(huì)在 9200 端口上運(yùn)行瞒瘸,我們打開瀏覽器訪問http://localhost:9200/ 就可以看到類似內(nèi)容:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n16" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">{
"name" : "atntrTf",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "e64hkjGtTp6_G2h1Xxdv5g",
"version" : {
"number": "6.2.4",
"build_hash": "ccec39f",
"build_date": "2018-04-12T20:37:28.497551Z",
"build_snapshot": false,
"lucene_version": "7.2.1",
"minimum_wire_compatibility_version": "5.6.0",
"minimum_index_compatibility_version": "5.0.0"
},
"tagline" : "You Know, for Search"
}</pre>
如果看到這個(gè)內(nèi)容,就說明 Elasticsearch 安裝并啟動(dòng)成功了熄浓,這里顯示我的 Elasticsearch 版本是 6.2.4 版本情臭,版本很重要,以后安裝一些插件都要做到版本對(duì)應(yīng)才可以赌蔑。
接下來我們來了解一下 Elasticsearch 的基本概念以及和 Python 的對(duì)接俯在。
Elasticsearch 相關(guān)概念
在 Elasticsearch 中有幾個(gè)基本的概念,如節(jié)點(diǎn)娃惯、索引跷乐、文檔等等,下面來分別說明一下趾浅,理解了這些概念對(duì)熟悉 Elasticsearch 是非常有幫助的愕提。
Node 和 Cluster
Elasticsearch 本質(zhì)上是一個(gè)分布式數(shù)據(jù)庫,允許多臺(tái)服務(wù)器協(xié)同工作潮孽,每臺(tái)服務(wù)器可以運(yùn)行多個(gè) Elasticsearch 實(shí)例揪荣。
單個(gè) Elasticsearch 實(shí)例稱為一個(gè)節(jié)點(diǎn)(Node)。一組節(jié)點(diǎn)構(gòu)成一個(gè)集群(Cluster)往史。
Index
Elasticsearch 會(huì)索引所有字段仗颈,經(jīng)過處理后寫入一個(gè)反向索引(Inverted Index)。查找數(shù)據(jù)的時(shí)候,直接查找該索引挨决。
所以请祖,Elasticsearch 數(shù)據(jù)管理的頂層單位就叫做 Index(索引),其實(shí)就相當(dāng)于 MySQL脖祈、MongoDB 等里面的數(shù)據(jù)庫的概念肆捕。另外值得注意的是,每個(gè) Index (即數(shù)據(jù)庫)的名字必須是小寫盖高。
Document
Index 里面單條的記錄稱為 Document(文檔)慎陵。許多條 Document 構(gòu)成了一個(gè) Index。
Document 使用 JSON 格式表示喻奥,下面是一個(gè)例子席纽。
同一個(gè) Index 里面的 Document,不要求有相同的結(jié)構(gòu)(scheme)撞蚕,但是最好保持相同润梯,這樣有利于提高搜索效率。
Type
Document 可以分組甥厦,比如 weather 這個(gè) Index 里面纺铭,可以按城市分組(北京和上海),也可以按氣候分組(晴天和雨天)刀疙。這種分組就叫做 Type舶赔,它是虛擬的邏輯分組,用來過濾 Document谦秧,類似 MySQL 中的數(shù)據(jù)表顿痪,MongoDB 中的 Collection。
不同的 Type 應(yīng)該有相似的結(jié)構(gòu)(Schema)油够,舉例來說,id 字段不能在這個(gè)組是字符串征懈,在另一個(gè)組是數(shù)值石咬。這是與關(guān)系型數(shù)據(jù)庫的表的一個(gè)區(qū)別。性質(zhì)完全不同的數(shù)據(jù)(比如 products 和 logs)應(yīng)該存成兩個(gè) Index卖哎,而不是一個(gè) Index 里面的兩個(gè) Type(雖然可以做到)鬼悠。
根據(jù)規(guī)劃,Elastic 6.x 版只允許每個(gè) Index 包含一個(gè) Type亏娜,7.x 版將會(huì)徹底移除 Type焕窝。
Fields
即字段,每個(gè) Document 都類似一個(gè) JSON 結(jié)構(gòu)维贺,它包含了許多字段它掂,每個(gè)字段都有其對(duì)應(yīng)的值,多個(gè)字段組成了一個(gè) Document,其實(shí)就可以類比 MySQL 數(shù)據(jù)表中的字段虐秋。
在 Elasticsearch 中榕茧,文檔歸屬于一種類型(Type),而這些類型存在于索引(Index)中客给,我們可以畫一些簡(jiǎn)單的對(duì)比圖來類比傳統(tǒng)關(guān)系型數(shù)據(jù)庫:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n39" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices -> Types -> Documents -> Fields</pre>
以上就是 Elasticsearch 里面的一些基本概念用押,通過和關(guān)系性數(shù)據(jù)庫的對(duì)比更加有助于理解。
Python 對(duì)接 Elasticsearch
Elasticsearch 實(shí)際上提供了一系列 Restful API 來進(jìn)行存取和查詢操作靶剑,我們可以使用 curl 等命令來進(jìn)行操作蜻拨,但畢竟命令行模式?jīng)]那么方便,所以這里我們就直接介紹利用 Python 來對(duì)接 Elasticsearch 的相關(guān)方法桩引。
Python 中對(duì)接 Elasticsearch 使用的就是一個(gè)同名的庫缎讼,安裝方式非常簡(jiǎn)單:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n45" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">pip3 install elasticsearch</pre>
官方文檔是:https://elasticsearch-py.readthedocs.io/,所有的用法都可以在里面查到阐污,文章后面的內(nèi)容也是基于官方文檔來的休涤。
創(chuàng)建 Index
我們先來看下怎樣創(chuàng)建一個(gè)索引(Index),這里我們創(chuàng)建一個(gè)名為 news 的索引:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="python" cid="n49" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">from elasticsearch import Elasticsearch
es = Elasticsearch()
result = es.indices.create(index='news', ignore=400)
print(result)</pre>
如果創(chuàng)建成功笛辟,會(huì)返回如下結(jié)果:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n51" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">{'acknowledged': True, 'shards_acknowledged': True, 'index': 'news'}</pre>
返回結(jié)果是 JSON 格式功氨,其中的 acknowledged 字段表示創(chuàng)建操作執(zhí)行成功。
但這時(shí)如果我們?cè)侔汛a執(zhí)行一次的話手幢,就會(huì)返回如下結(jié)果:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n54" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">{'error': {'root_cause': [{'type': 'resource_already_exists_exception', 'reason': 'index [news/QM6yz2W8QE-bflKhc5oThw] already exists', 'index_uuid': 'QM6yz2W8QE-bflKhc5oThw', 'index': 'news'}], 'type': 'resource_already_exists_exception', 'reason': 'index [news/QM6yz2W8QE-bflKhc5oThw] already exists', 'index_uuid': 'QM6yz2W8QE-bflKhc5oThw', 'index': 'news'}, 'status': 400}</pre>
它提示創(chuàng)建失敗捷凄,status 狀態(tài)碼是 400,錯(cuò)誤原因是 Index 已經(jīng)存在了围来。
注意這里我們的代碼里面使用了 ignore 參數(shù)為 400跺涤,這說明如果返回結(jié)果是 400 的話,就忽略這個(gè)錯(cuò)誤不會(huì)報(bào)錯(cuò)监透,程序不會(huì)執(zhí)行拋出異常桶错。
假如我們不加 ignore 這個(gè)參數(shù)的話:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n58" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">es = Elasticsearch()
result = es.indices.create(index='news')
print(result)</pre>
再次執(zhí)行就會(huì)報(bào)錯(cuò)了:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n60" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'resource_already_exists_exception', 'index [news/QM6yz2W8QE-bflKhc5oThw] already exists')</pre>
這樣程序的執(zhí)行就會(huì)出現(xiàn)問題,所以說胀蛮,我們需要善用 ignore 參數(shù)院刁,把一些意外情況排除,這樣可以保證程序的正常執(zhí)行而不會(huì)中斷粪狼。
刪除 Index
刪除 Index 也是類似的退腥,代碼如下:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n64" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">from elasticsearch import Elasticsearch
es = Elasticsearch()
result = es.indices.delete(index='news', ignore=[400, 404])
print(result)</pre>
這里也是使用了 ignore 參數(shù),來忽略 Index 不存在而刪除失敗導(dǎo)致程序中斷的問題再榄。
如果刪除成功狡刘,會(huì)輸出如下結(jié)果:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n67" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">{'acknowledged': True}</pre>
如果 Index 已經(jīng)被刪除,再執(zhí)行刪除則會(huì)輸出如下結(jié)果:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n69" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">{'error': {'root_cause': [{'type': 'index_not_found_exception', 'reason': 'no such index', 'resource.type': 'index_or_alias', 'resource.id': 'news', 'index_uuid': 'na', 'index': 'news'}], 'type': 'index_not_found_exception', 'reason': 'no such index', 'resource.type': 'index_or_alias', 'resource.id': 'news', 'index_uuid': 'na', 'index': 'news'}, 'status': 404}</pre>
這個(gè)結(jié)果表明當(dāng)前 Index 不存在困鸥,刪除失敗嗅蔬,返回的結(jié)果同樣是 JSON,狀態(tài)碼是 400,但是由于我們添加了 ignore 參數(shù)购城,忽略了 400 狀態(tài)碼吕座,因此程序正常執(zhí)行輸出 JSON 結(jié)果,而不是拋出異常瘪板。
插入數(shù)據(jù)
Elasticsearch 就像 MongoDB 一樣吴趴,在插入數(shù)據(jù)的時(shí)候可以直接插入結(jié)構(gòu)化字典數(shù)據(jù),插入數(shù)據(jù)可以調(diào)用 create() 方法侮攀,例如這里我們插入一條新聞數(shù)據(jù):
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="python" cid="n73" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">from elasticsearch import Elasticsearch
es = Elasticsearch()
es.indices.create(index='news', ignore=400)
data = {'title': '美國(guó)留給伊拉克的是個(gè)爛攤子嗎', 'url': 'http://view.news.qq.com/zt2011/usa_iraq/index.htm'}
result = es.create(index='news', doc_type='politics', id=1, body=data)
print(result)</pre>
這里我們首先聲明了一條新聞數(shù)據(jù)锣枝,包括標(biāo)題和鏈接,然后通過調(diào)用 create() 方法插入了這條數(shù)據(jù)兰英,在調(diào)用 create() 方法時(shí)撇叁,我們傳入了四個(gè)參數(shù),index 參數(shù)代表了索引名稱畦贸,doc_type 代表了文檔類型陨闹,body 則代表了文檔具體內(nèi)容,id 則是數(shù)據(jù)的唯一標(biāo)識(shí) ID薄坏。
運(yùn)行結(jié)果如下:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n76" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">{'_index': 'news', '_type': 'politics', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}</pre>
結(jié)果中 result 字段為 created趋厉,代表該數(shù)據(jù)插入成功。
另外其實(shí)我們也可以使用 index() 方法來插入數(shù)據(jù)胶坠,但與 create() 不同的是君账,create() 方法需要我們指定 id 字段來唯一標(biāo)識(shí)該條數(shù)據(jù),而 index() 方法則不需要沈善,如果不指定 id乡数,會(huì)自動(dòng)生成一個(gè) id,調(diào)用 index() 方法的寫法如下:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n79" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">es.index(index='news', doc_type='politics', body=data)</pre>
create() 方法內(nèi)部其實(shí)也是調(diào)用了 index() 方法闻牡,是對(duì) index() 方法的封裝净赴。
更新數(shù)據(jù)
更新數(shù)據(jù)也非常簡(jiǎn)單,我們同樣需要指定數(shù)據(jù)的 id 和內(nèi)容罩润,調(diào)用 update() 方法即可劫侧,代碼如下:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n83" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">from elasticsearch import Elasticsearch
es = Elasticsearch()
data = {
'title': '美國(guó)留給伊拉克的是個(gè)爛攤子嗎',
'url': 'http://view.news.qq.com/zt2011/usa_iraq/index.htm',
'date': '2011-12-16'
}
result = es.update(index='news', doc_type='politics', body=data, id=1)
print(result)</pre>
這里我們?yōu)閿?shù)據(jù)增加了一個(gè)日期字段,然后調(diào)用了 update() 方法哨啃,結(jié)果如下:
<pre spellcheck="false" class="md-fences mock-cm md-end-block" lang="" cid="n85" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: pre-wrap; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">
{{'_index': 'news', '_type': 'politics', '_id': '1', '_version': 2, 'result': 'updated', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 1, '_primary_term': 1}</pre>
可以看到返回結(jié)果中,result 字段為 updated写妥,即表示更新成功拳球,另外我們還注意到有一個(gè)字段 _version,這代表更新后的版本號(hào)數(shù)珍特,2 代表這是第二個(gè)版本祝峻,因?yàn)橹耙呀?jīng)插入過一次數(shù)據(jù),所以第一次插入的數(shù)據(jù)是版本 1,可以參見上例的運(yùn)行結(jié)果莱找,這次更新之后版本號(hào)就變成了 2酬姆,以后每更新一次,版本號(hào)都會(huì)加 1奥溺。
另外更新操作其實(shí)利用 index() 方法同樣可以做到辞色,寫法如下:
<pre spellcheck="false" class="md-fences mock-cm md-end-block" lang="" cid="n88" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: pre-wrap; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">es.index(index='news', doc_type='politics', body=data, id=1)</pre>
可以看到,index() 方法可以代替我們完成兩個(gè)操作浮定,如果數(shù)據(jù)不存在相满,那就執(zhí)行插入操作,如果已經(jīng)存在桦卒,那就執(zhí)行更新操作立美,非常方便。
刪除數(shù)據(jù)
如果想刪除一條數(shù)據(jù)可以調(diào)用 delete() 方法方灾,指定需要?jiǎng)h除的數(shù)據(jù) id 即可建蹄,寫法如下:
<pre spellcheck="false" class="md-fences mock-cm md-end-block" lang="" cid="n92" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: pre-wrap; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">from elasticsearch import Elasticsearch
es = Elasticsearch()
result = es.delete(index='news', doc_type='politics', id=1)
print(result)</pre>
運(yùn)行結(jié)果如下:
<pre spellcheck="false" class="md-fences mock-cm md-end-block" lang="" cid="n94" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: pre-wrap; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">{'_index': 'news', '_type': 'politics', '_id': '1', '_version': 3, 'result': 'deleted', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 2, '_primary_term': 1}</pre>
可以看到運(yùn)行結(jié)果中 result 字段為 deleted,代表刪除成功裕偿,_version 變成了 3洞慎,又增加了 1。
查詢數(shù)據(jù)
上面的幾個(gè)操作都是非常簡(jiǎn)單的操作击费,普通的數(shù)據(jù)庫如 MongoDB 都是可以完成的拢蛋,看起來并沒有什么了不起的,Elasticsearch 更特殊的地方在于其異常強(qiáng)大的檢索功能蔫巩。
對(duì)于中文來說谆棱,我們需要安裝一個(gè)分詞插件,這里使用的是 elasticsearch-analysis-ik圆仔,GitHub 鏈接為:https://github.com/medcl/elasticsearch-analysis-ik垃瞧,這里我們使用 Elasticsearch 的另一個(gè)命令行工具 elasticsearch-plugin 來安裝,這里安裝的版本是 6.2.4坪郭,請(qǐng)確保和 Elasticsearch 的版本對(duì)應(yīng)起來个从,命令如下:
<pre spellcheck="false" class="md-fences mock-cm md-end-block" lang="" cid="n99" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: pre-wrap; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">elasticsearche -plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.2.4/elasticsearch-analysis-ik-6.2.4.zip</pre>
這里的版本號(hào)請(qǐng)?zhí)鎿Q成你的 Elasticsearch 的版本號(hào)。
安裝之后重新啟動(dòng) Elasticsearch 就可以了歪沃,它會(huì)自動(dòng)加載安裝好的插件嗦锐。
首先我們新建一個(gè)索引并指定需要分詞的字段,代碼如下:
<pre spellcheck="false" class="md-fences mock-cm md-end-block" lang="" cid="n103" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: pre-wrap; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">from elasticsearch import Elasticsearch
es = Elasticsearch()
mapping = {
'properties': {
'title': {
'type': 'text',
'analyzer': 'ik_max_word',
'search_analyzer': 'ik_max_word'
}
}
}
es.indices.delete(index='news', ignore=[400, 404])
es.indices.create(index='news', ignore=400)
result = es.indices.put_mapping(index='news', doc_type='politics', body=mapping)
print(result)</pre>
這里我們先將之前的索引刪除了沪曙,然后新建了一個(gè)索引奕污,然后更新了它的 mapping 信息,mapping 信息中指定了分詞的字段液走,指定了字段的類型 type 為 text碳默,分詞器 analyzer 和 搜索分詞器 search_analyzer 為 ik_max_word贾陷,即使用我們剛才安裝的中文分詞插件。如果不指定的話則使用默認(rèn)的英文分詞器嘱根。
接下來我們插入幾條新的數(shù)據(jù):
<pre spellcheck="false" class="md-fences mock-cm md-end-block" lang="" cid="n106" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: pre-wrap; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">datas = [
{
'title': '美國(guó)留給伊拉克的是個(gè)爛攤子嗎',
'url': 'http://view.news.qq.com/zt2011/usa_iraq/index.htm',
'date': '2011-12-16'
},
{
'title': '公安部:各地校車將享最高路權(quán)',
'url': 'http://www.chinanews.com/gn/2011/12-16/3536077.shtml',
'date': '2011-12-16'
},
{
'title': '中韓漁警沖突調(diào)查:韓警平均每天扣1艘中國(guó)漁船',
'url': 'https://news.qq.com/a/20111216/001044.htm',
'date': '2011-12-17'
},
{
'title': '中國(guó)駐洛杉磯領(lǐng)事館遭亞裔男子槍擊 嫌犯已自首',
'url': 'http://news.ifeng.com/world/detail_2011_12/16/11372558_0.shtml',
'date': '2011-12-18'
}
]
for data in datas:
es.index(index='news', doc_type='politics', body=data)</pre>
這里我們指定了四條數(shù)據(jù)髓废,都帶有 title、url该抒、date 字段烫幕,然后通過 index() 方法將其插入 Elasticsearch 中相叁,索引名稱為 news畜隶,類型為 politics享钞。
接下來我們根據(jù)關(guān)鍵詞查詢一下相關(guān)內(nèi)容:
<pre spellcheck="false" class="md-fences mock-cm md-end-block" lang="" cid="n109" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: pre-wrap; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">result = es.search(index='news', doc_type='politics')
print(result)</pre>
可以看到查詢出了所有插入的四條數(shù)據(jù):
<pre spellcheck="false" class="md-fences mock-cm md-end-block" lang="" cid="n111" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: pre-wrap; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1.0,
"hits": [
{
"_index": "news",
"_type": "politics",
"_id": "c05G9mQBD9BuE5fdHOUT",
"_score": 1.0,
"_source": {
"title": "美國(guó)留給伊拉克的是個(gè)爛攤子嗎",
"url": "http://view.news.qq.com/zt2011/usa_iraq/index.htm",
"date": "2011-12-16"
}
},
{
"_index": "news",
"_type": "politics",
"_id": "dk5G9mQBD9BuE5fdHOUm",
"_score": 1.0,
"_source": {
"title": "中國(guó)駐洛杉磯領(lǐng)事館遭亞裔男子槍擊,嫌犯已自首",
"url": "http://news.ifeng.com/world/detail_2011_12/16/11372558_0.shtml",
"date": "2011-12-18"
}
},
{
"_index": "news",
"_type": "politics",
"_id": "dU5G9mQBD9BuE5fdHOUj",
"_score": 1.0,
"_source": {
"title": "中韓漁警沖突調(diào)查:韓警平均每天扣1艘中國(guó)漁船",
"url": "https://news.qq.com/a/20111216/001044.htm",
"date": "2011-12-17"
}
},
{
"_index": "news",
"_type": "politics",
"_id": "dE5G9mQBD9BuE5fdHOUf",
"_score": 1.0,
"_source": {
"title": "公安部:各地校車將享最高路權(quán)",
"url": "http://www.chinanews.com/gn/2011/12-16/3536077.shtml",
"date": "2011-12-16"
}
}
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1.0,
"hits": [
{
"_index": "news",
"_type": "politics",
"_id": "c05G9mQBD9BuE5fdHOUT",
"_score": 1.0,
"_source": {
"title": "美國(guó)留給伊拉克的是個(gè)爛攤子嗎",
"url": "http://view.news.qq.com/zt2011/usa_iraq/index.htm",
"date": "2011-12-16"
}
},
{
"_index": "news",
"_type": "politics",
"_id": "dk5G9mQBD9BuE5fdHOUm",
"_score": 1.0,
"_source": {
"title": "中國(guó)駐洛杉磯領(lǐng)事館遭亞裔男子槍擊愉适,嫌犯已自首",
"url": "http://news.ifeng.com/world/detail_2011_12/16/11372558_0.shtml",
"date": "2011-12-18"
}
},
{
"_index": "news",
"_type": "politics",
"_id": "dU5G9mQBD9BuE5fdHOUj",
"_score": 1.0,
"_source": {
"title": "中韓漁警沖突調(diào)查:韓警平均每天扣1艘中國(guó)漁船",
"url": "https://news.qq.com/a/20111216/001044.htm",
"date": "2011-12-17"
}
},
{
"_index": "news",
"_type": "politics",
"_id": "dE5G9mQBD9BuE5fdHOUf",
"_score": 1.0,
"_source": {
"title": "公安部:各地校車將享最高路權(quán)",
"url": "http://www.chinanews.com/gn/2011/12-16/3536077.shtml",
"date": "2011-12-16"
}
}
]
}
} ]
}
}</pre>
可以看到返回結(jié)果會(huì)出現(xiàn)在 hits 字段里面犯助,然后其中有 total 字段標(biāo)明了查詢的結(jié)果條目數(shù),還有 max_score 代表了最大匹配分?jǐn)?shù)维咸。
另外我們還可以進(jìn)行全文檢索剂买,這才是體現(xiàn) Elasticsearch 搜索引擎特性的地方:
<pre spellcheck="false" class="md-fences mock-cm md-end-block" lang="" cid="n114" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: pre-wrap; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">dsl = {
'query': {
'match': {
'title': '中國(guó) 領(lǐng)事館'
}
}
}
es = Elasticsearch()
result = es.search(index='news', doc_type='politics', body=dsl)
print(json.dumps(result, indent=2, ensure_ascii=False))</pre>
這里我們使用 Elasticsearch 支持的 DSL 語句來進(jìn)行查詢,使用 match 指定全文檢索癌蓖,檢索的字段是 title瞬哼,內(nèi)容是“中國(guó)領(lǐng)事館”,搜索結(jié)果如下:
<pre spellcheck="false" class="md-fences mock-cm md-end-block" lang="" cid="n116" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: pre-wrap; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 4px 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 2.546152,
"hits": [
{
"_index": "news",
"_type": "politics",
"_id": "dk5G9mQBD9BuE5fdHOUm",
"_score": 2.546152,
"_source": {
"title": "中國(guó)駐洛杉磯領(lǐng)事館遭亞裔男子槍擊租副,嫌犯已自首",
"url": "http://news.ifeng.com/world/detail_2011_12/16/11372558_0.shtml",
"date": "2011-12-18"
}
},
{
"_index": "news",
"_type": "politics",
"_id": "dU5G9mQBD9BuE5fdHOUj",
"_score": 0.2876821,
"_source": {
"title": "中韓漁警沖突調(diào)查:韓警平均每天扣1艘中國(guó)漁船",
"url": "https://news.qq.com/a/20111216/001044.htm",
"date": "2011-12-17"
}
}
]
}
}</pre>
這里我們看到匹配的結(jié)果有兩條坐慰,第一條的分?jǐn)?shù)為 2.54,第二條的分?jǐn)?shù)為 0.28用僧,這是因?yàn)榈谝粭l匹配的數(shù)據(jù)中含有“中國(guó)”和“領(lǐng)事館”兩個(gè)詞结胀,第二條匹配的數(shù)據(jù)中不包含“領(lǐng)事館”,但是包含了“中國(guó)”這個(gè)詞责循,所以也被檢索出來了糟港,但是分?jǐn)?shù)比較低。
因此可以看出院仿,檢索時(shí)會(huì)對(duì)對(duì)應(yīng)的字段全文檢索秸抚,結(jié)果還會(huì)按照檢索關(guān)鍵詞的相關(guān)性進(jìn)行排序,這就是一個(gè)基本的搜索引擎雛形歹垫。
另外 Elasticsearch 還支持非常多的查詢方式剥汤,詳情可以參考官方文檔:https://www.elastic.co/guide/en/elasticsearch/reference/6.3/query-dsl.html
以上便是對(duì) Elasticsearch 的基本介紹以及 Python 操作 Elasticsearch 的基本用法,但這僅僅是 Elasticsearch 的基本功能排惨,它還有更多強(qiáng)大的功能等待著我們的探索吭敢,后面會(huì)繼續(xù)更新,敬請(qǐng)期待若贮。
本節(jié)代碼:https://github.com/Germey/ElasticSearch省有。
資料推薦 另外推薦幾個(gè)不錯(cuò)的學(xué)習(xí)站點(diǎn):
Elasticsearch 權(quán)威指南:https://es.xiaoleilu.com/index.html 全文搜索引擎 Elasticsearch 入門教程:http://www.ruanyifeng.com/blog/2017/08/elasticsearch.html Elastic 中文社區(qū):https://www.elasticsearch.cn/