elasticsearch Search APIs
URL Search API
語法:
get <index_name>/_search
post <index_name>/_search
{
}
說明:
<index_name>/_search
可以省略不寫潭苞,如果不寫甫贯,查詢范圍整個集群的所有索引<index_name>/_search
支持通配符酬凳,比如user*
表示查詢范圍所有以user
開頭的索引<index_name>/_search
支持多個薄坏,中間以英文(半角)逗號隔開迈倍,比如user1,user2/_search
表示查詢范圍是user1,user2
這兩個索引get請求可以在URL上加上請求參數(shù),使用Query String Syntax
post/get請求可以添加Request Body范咨,使用Query Domain Specific Language(DSL)
具體可以參考Search API
Query String Syntax
demo:
# 獲取2012的電影
get movies/_search?q=2012&df=year&sort=year:desc&from=0&size=10&timeout=1s
- q指定查詢語句,使用Query String Syntax
- df指定查詢的字段
- sort指定排序規(guī)則
- from和size用于分頁
- q可以指定字段厂庇,精確查詢渠啊,模糊查詢。
- 單字段精確查詢宋列,
q=k:v
昭抒, 例如:q=year:2012
- 泛查詢,正對
_all
炼杖,所有字段:q=v
灭返,例如:get movies/_search?q=2012
- Term查詢
Beautiful Mind
等效于Beautiful OR Mind
- Phrase查詢:
"Beautiful Mind"
等效于Beautiful AND Mind
。要求前后順序一致 - 條件組合查詢:
- 單條件查詢:
q=+k1:v1 -k2:v2 k3:v3
坤邪,+
前綴表示必須與查詢條件匹配熙含;類似地,-
前綴表示一定不與查詢條件匹配艇纺;沒有+
或者-
地所有其他條件都是可選的怎静,匹配的越多,文檔就越相關(guān)黔衡。例如:get movies/_search?q=+year:2012 -title:"Bullet to the Head"
- 多條件組合查詢:
AND / OR / NOT
或者&& / || / !
蚓聘,注意:必須是大寫。
- 單條件查詢:
- 范圍查詢:
- 區(qū)間表示:[]閉區(qū)間盟劫,{}開區(qū)間
year:{2019 TO 2018]
year:[* TO 2018]
- 算數(shù)表示:
year:>2012
year:(>2012 && <=2018)
year:(+>2010 +<=2018)
- 區(qū)間表示:[]閉區(qū)間盟劫,{}開區(qū)間
- 通配符查詢(通配符查詢效率低夜牡,占用內(nèi)存大,不建議使用侣签。特別是放在最前面)
- ?表示1個字符塘装,*表示0個或多個字符:例如
GET /movies/_search?q=title:b*
- ?表示1個字符塘装,*表示0個或多個字符:例如
- 正則表達(dá)式查詢(查詢效率低急迂,不建議使用):
GET /movies/_search?q=title:[bt]oy
- 模糊查詢與近似查詢:
- 用
~
表示搜索單詞可能有一兩個字母寫的不對,按照相似度返回結(jié)果蹦肴,最多可以模糊 2 個距離僚碎。GET /movies/_search?q=title:beautifl~1
GET /movies/_search?q=title:"Lord Rings"~2
- 用
- 單字段精確查詢宋列,
Query Domain Specific Language(DSL)
舉例:
# 查詢2005年上映的電影
get movies/_search?q=year:2005
post movies/_search
{
"query":{
"match": {"year": 2005}
}
}
-
分頁查詢:
{ "from": 10, "size": 20, "query": { "match_all": {} } }
from從0開始,默認(rèn)返回10個結(jié)果阴幌,獲取靠后的翻頁成本較高勺阐。
-
排序
最好是數(shù)字類型或者日期類型的字段排序
因為對于多值類型或分析過的字段排序,系統(tǒng)會選一個值裂七,無法得知該值
{ "sort": [{"order_date": "desc"}] }
-
_source filtering
如果
_source
沒有存儲皆看,那就只返回匹配的文檔的元數(shù)據(jù)_source
支持使用通配符:_source["name*","desc*"]
{ "_source": ["order_date", "order_date","category_keyword"] }
-
腳本字段
{ "script_field": { "new_field": { "script":{ "lang": "painless", "source": "doc['order_date'].value+'hello'" } } } }
用例:訂單中有不同的匯率,需要結(jié)合匯率對訂單價格進(jìn)行排序背零。
Term-Level Queries
- Term是表達(dá)語意的最小單位腰吟。搜索和利用統(tǒng)計語言模型進(jìn)行自然語言處理都需要處理Term
- Term Level Query: Term Query / Range Query / Exists Query / Prefix Query / Wildcard Query
- 在Es中,Term Query徙瓶,對輸入不做分詞毛雇。會將輸入作為一個整體,在倒排索引中查找準(zhǔn)確的詞項侦镇,并且使用相關(guān)度算分公式為每個包含該詞項的文檔進(jìn)行相關(guān)性算分
- 可以通過Constant Score將查詢轉(zhuǎn)換成一個Filtering灵疮,避免算分,并利用緩存壳繁,提高性能震捣。
案例:
創(chuàng)建一個products的index,并插入3條數(shù)據(jù)
DELETE products
PUT products
{
"settings": {
"number_of_shards": 1
}
}
POST /products/_bulk
{ "index": { "_id": 1 }}
{ "productID" : "XHDK-A-1293-#fJ3","desc":"iPhone" }
{ "index": { "_id": 2 }}
{ "productID" : "KDKE-B-9947-#kL5","desc":"iPad" }
{ "index": { "_id": 3 }}
{ "productID" : "JODL-X-1937-#pV7","desc":"MBP" }
Term Query
使用Term Query闹炉,查看desc的值是iPhone
POST /products/_search
{
"query": {
"term": {
"desc": {
"value":"iPhone"
}
}
}
}
結(jié)果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
思考:document里明明有desc的值是iPhone
的蒿赢,為什么查不到數(shù)據(jù)呢?
答案:
由于插入一條document的時候渣触,會做分詞處理羡棵,使用的是Standard Analyzer,默認(rèn)會轉(zhuǎn)成小寫字母嗅钻,但是使用Term Query的時候皂冰,輸入不會做分詞處理,所以大寫的P不會轉(zhuǎn)成小寫的p养篓。如果查詢的值是iphone
就能得到結(jié)果
POST /products/_search
{
"query": {
"term": {
"desc": {
"value":"iphone"
}
}
}
}
-
使用Term Query秃流,根據(jù)productId查看
POST /products/_search { "query": { "term": { "productID": { "value": "XHDK-A-1293-#fJ3" } } } }
結(jié)果:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 0, "relation" : "eq" }, "max_score" : null, "hits" : [ ] } }
思考:為什么查不到數(shù)據(jù)?
答案:
如果我們使用的分詞器的語法對
XHDK-A-1293-#fJ3
這個text進(jìn)行分詞post _analyze { "analyzer": "standard", "text": "XHDK-A-1293-#fJ3" }
結(jié)果:
{ "tokens" : [ { "token" : "xhdk", "start_offset" : 0, "end_offset" : 4, "type" : "<ALPHANUM>", "position" : 0 }, { "token" : "a", "start_offset" : 5, "end_offset" : 6, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "1293", "start_offset" : 7, "end_offset" : 11, "type" : "<NUM>", "position" : 2 }, { "token" : "fj3", "start_offset" : 13, "end_offset" : 16, "type" : "<ALPHANUM>", "position" : 3 } ] }
還是因為Term Query對輸入不做分詞的緣故柳弄,導(dǎo)致查詢結(jié)果不符合預(yù)期舶胀。
如果執(zhí)行的是如下語句:
POST /products/_search { "query": { "term": { "productID": { "value": "xhdk" } } } }
則會返回對應(yīng)的結(jié)果。
如果想要全文匹配,可以執(zhí)行如下語句:
POST /products/_search { "query": { "term": { "productID.keyword": { "value": "XHDK-A-1293-#fJ3" } } } }
為什么加上
keyword
就能全文匹配呢峻贮?這實際上index mapping的配置逗载。
GET /products/_mapping
結(jié)果:
{ "products" : { "mappings" : { "properties" : { "desc" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "productID" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } } } }
-
由于Term Query還會返回Score某弦,比較影響性能铣猩,可以跳過算分的步驟
- 將Query轉(zhuǎn)成Filter惹想,忽略TF-IDF計算边臼,避免相關(guān)性算分的開銷
- Filter可以有效利用緩存
POST /products/_search { "explain": true, "query": { "constant_score": { "filter": { "term": { "productID.keyword": "XHDK-A-1293-#fJ3" } } } } }
Structured Search
- 對結(jié)構(gòu)化數(shù)據(jù)的搜索
- 日期号胚,bool類型和數(shù)字都是結(jié)構(gòu)化的
- 文本也可以是結(jié)構(gòu)化的
- 如彩色筆可以有離散的顏色集合:red街立、green啸盏、blue
- 一個blog可能被標(biāo)記了tag:distributed 骨田、search
- 電商網(wǎng)站上的商品都有upcs(通用產(chǎn)品碼universal product codes)或其他的唯一標(biāo)識耿导,它們都需要遵從嚴(yán)格規(guī)定的、結(jié)構(gòu)化的格式态贤。
- 布爾舱呻,時間,日期和數(shù)字這類結(jié)構(gòu)化數(shù)據(jù):有精確的格式悠汽,我們可以對這些格式進(jìn)行邏輯操作箱吕。
- 結(jié)構(gòu)化的文本可以做精確匹配或部分匹配
- Term Query / Prefix Query
- 結(jié)構(gòu)化結(jié)果只有“是”或“否”兩個值
- 根據(jù)場景需要,可以決定結(jié)構(gòu)化搜索是否需要打分柿冲。
Boolean
數(shù)據(jù)準(zhǔn)備:
DELETE products
POST /products/_bulk
{ "index": { "_id": 1 }}
{ "price" : 10,"avaliable":true,"date":"2018-01-01", "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20,"avaliable":true,"date":"2019-01-01", "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30,"avaliable":true, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30,"avaliable":false, "productID" : "QQPX-R-3956-#aD8" }
GET products/_mapping
案例:
#對布爾值 match 查詢茬高,有算分
POST products/_search
{
"profile": "true",
"explain": true,
"query": {
"term": {
"avaliable": true
}
}
}
#對布爾值,通過constant score 轉(zhuǎn)成 filtering假抄,沒有算分
POST products/_search
{
"profile": "true",
"explain": true,
"query": {
"constant_score": {
"filter": {
"term": {
"avaliable": true
}
}
}
}
}
Numeric Range
- gt 大于
- lt 小于
- gte 大于等于
- lte 小于等于
#數(shù)字類型 Term
POST products/_search
{
"profile": "true",
"explain": true,
"query": {
"term": {
"price": 30
}
}
}
#數(shù)字類型 terms
POST products/_search
{
"query": {
"constant_score": {
"filter": {
"terms": {
"price": [
"20",
"30"
]
}
}
}
}
}
#數(shù)字 Range 查詢
GET products/_search
{
"query" : {
"constant_score" : {
"filter" : {
"range" : {
"price" : {
"gte" : 20,
"lte" : 30
}
}
}
}
}
}
Date Range
表達(dá)式 | 說明 |
---|---|
y |
Years |
M |
Months |
w |
Weeks |
d |
Days |
h |
Hours |
H |
Hours |
m |
Minutes |
s |
Seconds |
假設(shè)now
表示現(xiàn)在時間是2021-07-04 12:00:00
表達(dá)式 | 說明 |
---|---|
now+1h |
2021-07-04 13:00:00 |
now-1h |
2021-07-04 11:00:00 |
2021.07.04||+1M/d |
2021-08-04 00:00:00 |
案列:
POST products/_search{ "query" : { "constant_score" : { "filter" : { "range" : { "date" : { "gte" : "now-5y" } } } } }}
Exists
如下情況怎栽,調(diào)用exists方法時不會返回結(jié)果
如果該字段不存在,對應(yīng)的值為null或者[]
-
如果該字段存在宿饱,存在如下情況:
- 空字符串
""
或者"-"
- 數(shù)組中包含null熏瞄,
[null, "foo"]
- 自定義了
null-value
,在定義index mapping的時候
- 空字符串
POST products/_search{ "query": { "constant_score": { "filter": { "exists": { "field": "date" } } } }}POST products/_search{ "query": { "constant_score": { "filter": { "bool": { "must_not": { "exists": { "field": "date" } } } } } }}
Terms
查找包含多個精確值刑棵,注意包含而不是相等
PUT my-index-000001{ "mappings": { "properties": { "color": { "type": "keyword" } } }}PUT my-index-000001/_bulk{"index": {"_id": 1}}{"color": ["blue", "green"]}{"index": {"_id": 2}}{"color": "blue"}GET my-index-000001/_search?pretty{ "query": { "terms": { "color" : { "index" : "my-index-000001", "id" : "2", "path" : "color" } } }}POST movies/_search{ "query": { "constant_score": { "filter": { "term": { "genre.keyword": "Comedy" } } } }}POST products/_search{ "query": { "constant_score": { "filter": { "terms": { "productID.keyword": [ "QQPX-R-3956-#aD8", "JODL-X-1937-#pV7" ] } } } }}
Full Text Query
- Full Text Query的分類
- Match Query
- Match Phrase Query
- Query String Query
- Multi Match Query
- Simple Query String Query
- 特點
- 索引和搜索時都會進(jìn)行分詞巴刻,查詢字符串先傳遞到一個合適的分詞器,然后生成一個供查詢的詞項列表蛉签。
- 查詢時候胡陪,先會對輸入的查詢進(jìn)行分詞,然后每個詞項逐個進(jìn)行底層的查詢碍舍,最終將結(jié)果進(jìn)行合并柠座。并未每個文檔生成一個算分。
Query String Query
類似[URL Search](#URL Search API)
-
Query String Query
GET /movies/_search{ "profile": true, "query":{ "query_string":{ "default_field": "title", "query": "Beautiful AND Mind" } }}
GET /movies/_search{ "profile": true, "query":{ "query_string":{ "fields":[ "title", "year" ], "query": "2012" } }}
Simple Query String Query
- 類似Query String片橡,但是會忽略錯誤的語法妈经。
- 只支持部分查詢語句
- 不支持AND OR NOT,會被當(dāng)做字符串處理
- Term之間默認(rèn)的關(guān)系是OR,可以指定Operator
- 支持部分邏輯
- +替代AND
- |替代OR
- -替代NOT
GET /movies/_search{ "profile":true, "query":{ "simple_query_string":{ "query":"Beautiful +mind", "fields":["title"] } }}
Match Query
# 查看title里包含Beautiful OR Mind的電影POST movies/_search{ "query": { "match": { "title": { "query": "Beautiful Mind" } } }}# 查看title里包含Beautiful AND Mind的電影POST movies/_search{ "query": { "match": { "title": { "query": "Beautiful Mind", "operator": "AND" } } }}
Match Phrase Query
與Match Query不同的是吹泡,不會對查詢的text進(jìn)行分詞骤星,還是作為一個完整的短語。
POST movies/_search{ "query": { "match_phrase": { "title":{ "query": "one I love" } } }}POST movies/_search{ "query": { "match_phrase": { "title":{ "query": "one love", "slop": 1 } } }}
這種精確匹配在大部分情況下顯得太嚴(yán)苛了爆哑,有時我們想要包含 ""I like swimming and riding!"" 的文檔也能夠匹配 "I like riding"洞难。這時就要以用到 "slop" 參數(shù)來控制查詢語句的靈活度。
slop
參數(shù)告訴 match_phrase
查詢詞條相隔多遠(yuǎn)時仍然能將文檔視為匹配 什么是相隔多遠(yuǎn)揭朝? 意思是說為了讓查詢和文檔匹配你需要移動詞條多少次队贱?
Multi Match Query
multi_match
查詢建立在 match
查詢之上,重要的是它允許對多個字段查詢潭袱。
類型 | 說明 | 備注 |
---|---|---|
Best Fields |
查找匹配任何字段的文檔柱嫌,但使用來自最佳字段的 _score | 當(dāng)字段之間相互競爭,又相互關(guān)聯(lián)屯换。評分來自最匹配的字段编丘。 |
Most Fields |
多個字段都包含相同的文本的場合,會將所有字段的評分合并起來 | 處理英文內(nèi)容時:一種常見的手段是彤悔,在主字段(Engilsh Analyzer)瘪吏,抽取詞干,以匹配更多的文檔蜗巧。相同的文本掌眠,加入子字段(Standard Analyzer),以提供更加精確的匹配幕屹。其他字段作為匹配文檔提高相關(guān)度的信號蓝丙。匹配字段越多則越好。<br />無法使用Operator<br />可以用copy_to解決望拖,但需要額外的存儲空間 |
Cross Fields |
首先分析查詢字符串并生成一個詞列表渺尘,然后從所有字段中依次搜索每個詞,只要查詢到说敏,就算匹配上鸥跟。 | 對于某些實體,例如人名盔沫,地址医咨,圖書信息。需要在多個字段中確定信息架诞,單個字段只能作為整體的一部分拟淮。希望在任何這些列出的字段中找到盡可能多的詞。<br />支持operator<br />與copy_to相比谴忧,它可以在搜索時為單個字段提升權(quán)重 |
phrase |
同match_phrase + best_field | |
phrase_prefix |
同match_phrase_prefix + best_field | |
bool_prefix |
同match_bool_prefix + most field |
POST blogs/_search{ "query": { "dis_max": { "queries": [ { "match": { "title": "Quick pets" }}, { "match": { "body": "Quick pets" }} ], "tie_breaker": 0.2 } }}POST blogs/_search{ "query": { "multi_match": { "type": "best_fields", "query": "Quick pets", "fields": ["title","body"], "tie_breaker": 0.2, "minimum_should_match": "20%" } }}POST books/_search{ "multi_match": { "query": "Quick brown fox", "fields": "*_title" }}POST books/_search{ "multi_match": { "query": "Quick brown fox", "fields": [ "*_title", "chapter_title^2" ] }}DELETE /titlesPUT /titles{ "mappings": { "properties": { "title": { "type": "text", "analyzer": "english", "fields": {"std": {"type": "text","analyzer": "standard"}} } } }}POST titles/_bulk{ "index": { "_id": 1 }}{ "title": "My dog barks" }{ "index": { "_id": 2 }}{ "title": "I see a lot of barking dogs on the road " }GET /titles/_search{ "query": { "multi_match": { "query": "barking dogs", "type": "most_fields", "fields": [ "title", "title.std" ] } }}GET /titles/_search{ "query": { "multi_match": { "query": "barking dogs", "type": "most_fields", "fields": [ "title^10", "title.std" ] } }}
Compound queries
Query Context & Filter Context
- 高級搜索的功能:支持多項文本輸入很泊,針對多個字段進(jìn)行搜索角虫。
- 搜索引擎一般也提供基于時間,價格等條件的過濾
- 在es中委造,有Query和Filter兩種不同的Context
- Query Context:相關(guān)性算分
- Filter Context: 不需要算分戳鹅,可以利用Cache,獲得更好的性能
Boolean Query
案例:
- 假設(shè)要搜索一本電影昏兆,包含了以下一些條件
- 評論中包含了Guitar粉楚,用戶打分高于3分,同時上映日期要在1993與2000年之間
- 這個搜索包含了3段邏輯
- 評論字段中要包含Guitar
- 用戶評分字段要高于3分
- 上映日期字段需要在給定的范圍
特點:
- 一個boolean Query亮垫,是一個或多個查詢子句的組合
- 總共包括4種子句,其中2種影響算分伟骨,2種不影響
-
must:
必須匹配饮潦,貢獻(xiàn)算分 -
should:
選擇性匹配,貢獻(xiàn)算分 -
must_not:
Filter Context 查詢子句携狭,必須不能匹配继蜡,不貢獻(xiàn)算分 -
filter:
Filter Context必須匹配,但是不貢獻(xiàn)算分
-
- 總共包括4種子句,其中2種影響算分伟骨,2種不影響
- 相關(guān)性并不只是全文檢索的專利逛腿,也適用于
yes|no
的子句稀并,匹配的子句越多,相關(guān)性評分越高单默。如果多條查詢子句被合并為一條復(fù)合查詢語句碘举,比如boolean query,則每個查詢子句計算得出的評分會被合到總的相關(guān)性評分中搁廓。 - 同一層級下的競爭字段引颈,具有相同的權(quán)重
- 通過嵌套boolean query,可以改變對算分的影響
- should里嵌套must_not子查詢境蜕,可以實現(xiàn)should not的邏輯
語法:
- 子查詢可以任意順序出現(xiàn)
- 可以嵌套子查詢
- 如果沒有Must條件蝙场,should中必須滿足其中一條查詢,使用數(shù)組
POST /products/_search{ "query": { "bool" : { "must" : { "term" : { "price" : "30" } }, "filter": { "term" : { "avaliable" : "true" } }, "must_not" : { "range" : { "price" : { "lte" : 10 } } }, "should" : [ { "term" : { "productID.keyword" : "JODL-X-1937-#pV7" } }, { "term" : { "productID.keyword" : "XHDK-A-1293-#fJ3" } } ], "minimum_should_match" :1 } }}
如何解決Terms Query遺留下來的問題粱年,包含而不是相等售滤。
增加count字段,使用boolean query解決
#改變數(shù)據(jù)模型台诗,增加字段完箩。解決數(shù)組包含而不是精確匹配的問題POST /newmovies/_bulk{ "index": { "_id": 1 }}{ "title" : "Father of the Bridge Part II","year":1995, "genre":"Comedy","genre_count":1 }{ "index": { "_id": 2 }}{ "title" : "Dave","year":1993,"genre":["Comedy","Romance"],"genre_count":2 }#must,有算分POST /newmovies/_search{ "query": { "bool": { "must": [ {"term": {"genre.keyword": {"value": "Comedy"}}}, {"term": {"genre_count": {"value": 1}}} ] } }}#Filter拉队。不參與算分嗜憔,結(jié)果的score是0POST /newmovies/_search{ "query": { "bool": { "filter": [ {"term": {"genre.keyword": {"value": "Comedy"}}}, {"term": {"genre_count": {"value": 1}}} ] } }}#Query ContextPOST /products/_search{ "query": { "bool": { "should": [ { "term": { "productID.keyword": { "value": "JODL-X-1937-#pV7"}} }, {"term": {"avaliable": {"value": true}} } ] } }}#嵌套,實現(xiàn)了 should not 邏輯POST /products/_search{ "query": { "bool": { "must": { "term": { "price": "30" } }, "should": [ { "bool": { "must_not": { "term": { "avaliable": "false" } } } } ], "minimum_should_match": 1 } }}#Controll the PrecisionPOST _search{ "query": { "bool" : { "must" : { "term" : { "price" : "30" } }, "filter": { "term" : { "avaliable" : "true" } }, "must_not" : { "range" : { "price" : { "lte" : 10 } } }, "should" : [ { "term" : { "productID.keyword" : "JODL-X-1937-#pV7" } }, { "term" : { "productID.keyword" : "XHDK-A-1293-#fJ3" } } ], "minimum_should_match" :2 } }}
Boosting Query
- Boosting 是控制相關(guān)度的一種手段
- 索引氏仗,字段或者查詢子條件
- 參數(shù)boost的含義
- 當(dāng)boost > 1時吉捶,打分的相關(guān)度相對性提升
- 當(dāng)0<boost<1時夺鲜,打分的權(quán)重相對性降低
- 當(dāng)boost<0時,貢獻(xiàn)負(fù)分
- 希望包含了某項內(nèi)容的結(jié)果不是不出現(xiàn)呐舔,而是排序靠后币励。
案例:
DELETE blogsPOST /blogs/_bulk{ "index": { "_id": 1 }}{"title":"Apple iPad", "content":"Apple iPad,Apple iPad" }{ "index": { "_id": 2 }}{"title":"Apple iPad,Apple iPad", "content":"Apple iPad" }POST blogs/_search{ "query": { "bool": { "should": [ {"match": { "title": { "query": "apple,ipad", "boost": 1.1 } }}, {"match": { "content": { "query": "apple,ipad", "boost": 2 } }} ] } }}DELETE newsPOST /news/_bulk{ "index": { "_id": 1 }}{ "content":"Apple Mac" }{ "index": { "_id": 2 }}{ "content":"Apple iPad" }{ "index": { "_id": 3 }}{ "content":"Apple employee like Apple Pie and Apple Juice" }POST news/_search{ "query": { "bool": { "must": { "match":{"content":"apple"} } } }}POST news/_search{ "query": { "bool": { "must": { "match":{"content":"apple"} }, "must_not": { "match":{"content":"pie"} } } }}POST news/_search{ "query": { "boosting": { "positive": { "match": { "content": "apple" } }, "negative": { "match": { "content": "pie" } }, "negative_boost": 0.5 } }}
- positive: 必須存在,查詢對象,指定希望執(zhí)行的查詢子句,返回的結(jié)果都將滿足該子句指定的條件
- negative:必須存在,查詢對象,指定的查詢子句用于降低匹配文檔的相似度分
- negative_boost:必須存在,浮點數(shù),介于0與1.0之間的浮點數(shù),用于降低匹配文檔的相似分
Constant Score Query
Disjunction Max Query
單字符串查詢的實例
PUT /blogs/_doc/1{ "title": "Quick brown rabbits", "body": "Brown rabbits are commonly seen."}PUT /blogs/_doc/2{ "title": "Keeping pets healthy", "body": "My quick brown fox eats rabbits on a regular basis."}POST /blogs/_search{ "query": { "bool": { "should": [ { "match": { "title": "Brown fox" }}, { "match": { "body": "Brown fox" }} ] } }}
預(yù)期:
title:文檔1中出現(xiàn)了Brown
body:文檔1中出現(xiàn)了Brown,文檔2中出現(xiàn)了Brown fox珊拼,并且保持和查詢一致的順序食呻,目測應(yīng)該是文檔2的相關(guān)性算分最高。
結(jié)果:
文檔1的算分比文檔2的高澎现。
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 0.90425634, "hits" : [ { "_index" : "blogs", "_type" : "_doc", "_id" : "1", "_score" : 0.90425634, "_source" : { "title" : "Quick brown rabbits", "body" : "Brown rabbits are commonly seen." } }, { "_index" : "blogs", "_type" : "_doc", "_id" : "2", "_score" : 0.77041256, "_source" : { "title" : "Keeping pets healthy", "body" : "My quick brown fox eats rabbits on a regular basis." } } ] }}
算分過程:
- 查詢should語句中的兩個查詢
- 兩個查詢的評分相加
- 乘以匹配語句的總數(shù)
- 除以所以語句的總數(shù)
可以使用explain看一下查詢結(jié)果和分析
title和body相互競爭仅胞,不應(yīng)該將分?jǐn)?shù)簡單疊加,而是應(yīng)該找到單個最佳匹配的字段的評分剑辫。Disjunction Max Query將任何與任意查詢匹配的文檔作為結(jié)果返回干旧。采用字段上最匹配的評分最終評分返回。
POST blogs/_search{ "query": { "dis_max": { "queries": [ { "match": { "title": "Brown fox" }}, { "match": { "body": "Brown fox" }} ] } }}
這樣返回的結(jié)果就會符合預(yù)期妹蔽。
tie_breaker參數(shù):
- 獲得最佳匹配語句的評分_score椎眯。
- 將其他匹配語句的評分與tie_breaker相乘
- 對以上評分求和并規(guī)范化
- 是一個介于0-1之間的浮點數(shù)。0代表使用最佳匹配胳岂,1代表所有語句同等重要编整。
Function Score Query
算分與排序
- Elasticsearch默認(rèn)會以文檔的相關(guān)度算分進(jìn)行排序
- 可以指定一個或多個字段進(jìn)行排序
- 使用相關(guān)度算分排序,不能滿足某些特定條件
- 無法針對相關(guān)度乳丰,對排序?qū)崿F(xiàn)更多的控制
Function Score Query
- 可以在查詢后掌测,對每個匹配的文檔進(jìn)行一系列重新算分,根據(jù)新生成的分?jǐn)?shù)重新排序产园。
- function
- weight: 為每一個文檔設(shè)置一個簡單而不被規(guī)范化的權(quán)重
- Field Value Factor:使用該數(shù)值來修改_score赏半,例如將”熱度“和”點贊數(shù)“作為算分的參考因素
- Random Score:為每一個用戶使用不同的,隨機算分結(jié)果
- 衰減函數(shù):以某個字段的值為標(biāo)準(zhǔn)淆两,距離某個值越近断箫,得分越高
- Script Score:自定義腳本完全控制所需邏輯
- Boost Mode
- Multiply: 算分與函數(shù)值的成績
- Sum:算分與函數(shù)的和
- Min/Max: 算分與函數(shù)取最小、最大值
- Replace: 使用函數(shù)值取代算分
- Max Boost可以將算分控制在一個最大值
- 一致性隨機函數(shù):
- 使用場景:網(wǎng)站的廣告需要提供展現(xiàn)率
- 具體需求:讓每個用戶能看到不同的隨機數(shù)秋冰,也希望同一個用戶訪問的時候仲义,結(jié)果的相對順序一致
DELETE blogsPUT /blogs/_doc/1{ "title": "About popularity", "content": "In this post we will talk about...", "votes": 0}PUT /blogs/_doc/2{ "title": "About popularity", "content": "In this post we will talk about...", "votes": 100}PUT /blogs/_doc/3{ "title": "About popularity", "content": "In this post we will talk about...", "votes": 1000000}POST /blogs/_search{ "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes" } } }}POST /blogs/_search{ "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p" } } }}POST /blogs/_search{ "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p" , "factor": 0.1 } } }}POST /blogs/_search{ "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p" , "factor": 0.1 }, "boost_mode": "sum", "max_boost": 3 } }}POST /blogs/_search{ "query": { "function_score": { "random_score": { "seed": 911119 } } }}
Search Template
- Elasticsearch的查詢語句
- 對相關(guān)性算分、查詢性能都至關(guān)重要
- 在開發(fā)初期剑勾,雖說可以明確查詢參數(shù)埃撵,但是往往不能最終定義查詢的DSL的具體結(jié)構(gòu)
- 通過Search Template定義一個Contract
- 各司其職,解耦
- 開發(fā)人員虽另、搜索工程師暂刘,性能工程師
GET _search/template{ "source" : { "query": { "match" : { "{{my_field}}" : "{{my_value}}" } }, "size" : "{{my_size}}" }, "params" : { "my_field" : "message", "my_value" : "foo", "my_size" : 5 }}
Suggester API
- 什么是搜索建議
- 現(xiàn)代的搜索引擎,一般都會提供Suggest as you type的功能
- 幫助用戶在輸入搜索的過程中捂刺,進(jìn)行自動補全或者糾錯谣拣,通過協(xié)助用戶輸入更加精準(zhǔn)的關(guān)鍵詞募寨,提高后續(xù)搜索階段文檔匹配的程度
- 在google上搜索,一開始會自動補全森缠,當(dāng)輸入到一定長度拔鹰,如因為單詞拼寫錯誤無法補全,就會開始提示相似的詞或者句子
- API
- 搜索引擎中類似的功能贵涵,es是通過Suggester API實現(xiàn)的
- 原理:將輸入的文本分為Token列肢,然后在索引的字典里查找相似的Term并返回。
- Term Suggester(糾錯補全宾茂,輸入錯誤的情況下補全正確的單詞)
- Phrase Suggester(自動補全短語瓷马,輸入一個單詞補全整個短語)
- Complete Suggester(完成補全單詞,輸出如前半部分跨晴,補全整個單詞)
- Context Suggester(上下文補全)
- Suggestion Mode
- Missing-如索引中已經(jīng)存在欧聘,就不提供建議
- Popular-推薦出現(xiàn)頻率更高的詞
- Always-無論是否存在,都提供建議
- 精準(zhǔn)度和召回率比較
- 精準(zhǔn)度
- completion > phrase > term
- 召回率
- term > phrase > completion
- 性能
- completion > phrase > term
- 精準(zhǔn)度
Term Suggester && Prase Suggester
Term Suggester 先將搜索詞進(jìn)行分詞坟奥,然后逐個與指定的索引數(shù)據(jù)進(jìn)行比較,計算出編輯距離再返回建議詞拇厢。
編輯距離:這里使用了叫做Levenstein edit distance的算法爱谁,核心思想就是一個詞改動多少次就可以和另外的詞一致。比如說為了從elasticseach得到elasticsearch孝偎,就必須加入1個字母 r 访敌,也就是改動1次,所以這兩個詞的編輯距離就是1衣盾。
Prase Suggester在Term Suggester上增加了一些邏輯
Prase Suggester常用參數(shù)里max errors
:最多可以拼錯的Terms數(shù)寺旺,confidence
:限制返回結(jié)果數(shù),默認(rèn)為1
DELETE articlesPUT articles{ "mappings": { "properties": { "title_completion":{ "type": "completion" } } }}POST articles/_bulk{ "index" : { } }{ "title_completion": "lucene is very cool"}{ "index" : { } }{ "title_completion": "Elasticsearch builds on top of lucene"}{ "index" : { } }{ "title_completion": "Elasticsearch rocks"}{ "index" : { } }{ "title_completion": "elastic is the company behind ELK stack"}{ "index" : { } }{ "title_completion": "Elk stack rocks"}{ "index" : {} }POST articles/_search?pretty{ "size": 0, "suggest": { "article-suggester": { "prefix": "elk ", "completion": { "field": "title_completion" } } }}DELETE articlesPOST articles/_bulk{ "index" : { } }{ "body": "lucene is very cool"}{ "index" : { } }{ "body": "Elasticsearch builds on top of lucene"}{ "index" : { } }{ "body": "Elasticsearch rocks"}{ "index" : { } }{ "body": "elastic is the company behind ELK stack"}{ "index" : { } }{ "body": "Elk stack rocks"}{ "index" : {} }{ "body": "elasticsearch is rock solid"}POST _analyze{ "analyzer": "standard", "text": ["Elk stack rocks rock"]}POST /articles/_search{ "size": 1, "query": { "match": { "body": "lucen rock" } }, "suggest": { "term-suggestion": { "text": "lucen rock", "term": { "suggest_mode": "missing", "field": "body" } } }}POST /articles/_search{ "suggest": { "term-suggestion": { "text": "lucen rock", "term": { "suggest_mode": "popular", "field": "body" } } }}POST /articles/_search{ "suggest": { "term-suggestion": { "text": "lucen rock", "term": { "suggest_mode": "always", "field": "body", } } }}POST /articles/_search{ "suggest": { "term-suggestion": { "text": "lucen hocks", "term": { "suggest_mode": "always", "field": "body", "prefix_length":0, "sort": "frequency" } } }}POST /articles/_search{ "suggest": { "my-suggestion": { "text": "lucne and elasticsear rock hello world ", "phrase": { "field": "body", "max_errors":2, "confidence":0, "direct_generator":[{ "field":"body", "suggest_mode":"always" }], "highlight": { "pre_tag": "<em>", "post_tag": "</em>" } } } }}
Complection Suggester
- Complection Suggester提供了Auto Complete的功能势决。用戶每輸入一個字符阻塑,就需要即時發(fā)送一個查詢請求到后端查找匹配項。
- 對性能要求比較苛刻果复,elasticsearch采用了不同的數(shù)據(jù)結(jié)構(gòu)陈莽,并非通過倒排索引來完成的,而是將Analyzer的數(shù)據(jù)編碼成FST和索引一起存放虽抄,F(xiàn)ST會被ES整個加載到內(nèi)存走搁,速度很快
- FST只能用于前綴查找
- 定義mapping, 使用completion type
- 索引數(shù)據(jù)
- 運行suggest查詢
context Suggester
- 擴展了Completion Suggester
- 可以在搜索中加入更多的上下文信息,例如輸入“star”
- 咖啡相關(guān):建議“starbucks”
- 電影相關(guān):建議”star wars“
- 定義兩種類型的context
- Category-任意的字符串
- Geo-地理信息
- 定義mapping
- type
- name
- 索引數(shù)據(jù)迈窟,并且為每個document加入context信息
- 結(jié)合context進(jìn)行suggestion查詢
DELETE articlesPUT articles{ "mappings": { "properties": { "title_completion":{ "type": "completion" } } }}POST articles/_bulk{ "index" : { } }{ "title_completion": "lucene is very cool"}{ "index" : { } }{ "title_completion": "Elasticsearch builds on top of lucene"}{ "index" : { } }{ "title_completion": "Elasticsearch rocks"}{ "index" : { } }{ "title_completion": "elastic is the company behind ELK stack"}{ "index" : { } }{ "title_completion": "Elk stack rocks"}{ "index" : {} }POST articles/_search?pretty{ "size": 0, "suggest": { "article-suggester": { "prefix": "elk ", "completion": { "field": "title_completion" } } }}DELETE commentsPUT commentsPUT comments/_mapping{ "properties": { "comment_autocomplete":{ "type": "completion", "contexts":[{ "type":"category", "name":"comment_category" }] } }}POST comments/_doc{ "comment":"I love the star war movies", "comment_autocomplete":{ "input":["star wars"], "contexts":{ "comment_category":"movies" } }}POST comments/_doc{ "comment":"Where can I find a Starbucks", "comment_autocomplete":{ "input":["starbucks"], "contexts":{ "comment_category":"coffee" } }}POST comments/_search{ "suggest": { "MY_SUGGESTION": { "prefix": "sta", "completion":{ "field":"comment_autocomplete", "contexts":{ "comment_category":"coffee" } } } }}
Cross Cluster Search
水平擴展的痛點:
- 單集群:
- 當(dāng)水平擴展時私植,節(jié)點數(shù)不能無限增加
- 當(dāng)集群的meta信息(節(jié)點,索引车酣,集群狀態(tài))過多曲稼,會導(dǎo)致更新壓力變大索绪,單個active master會成為性能瓶頸,導(dǎo)致整個集群無法正常工作
- 早期版本躯肌,會通過tribe node可以實現(xiàn)多集群訪問的需求者春,但是還存在一定的問題
- tribe node會以client node的方式加入每個cluster,cluster中的master node的任務(wù)變更需要tribe node的回應(yīng)才能繼續(xù)
- tribe node不能保存cluster state信息清女,一旦restart cluster钱烟,初始化很慢
- 當(dāng)多個cluster存在索引重名的情況下,只能設(shè)置一種prefer規(guī)則
Cross Cluster Search
- 早期tribe node的方案存在一定的問題嫡丙,所以被deprecated
- es5.3引入了cross cluster search功能
- 允許任何節(jié)點扮演federated節(jié)點拴袭,以輕量的方式,將搜搜請求進(jìn)行代理
- 不需要以client node 的形式加入其它集群
案例:
//啟動3個集群bin/elasticsearch -E node.name=cluster0node -E cluster.name=cluster0 -E path.data=cluster0_data -E discovery.type=single-node -E http.port=9200 -E transport.port=9300bin/elasticsearch -E node.name=cluster1node -E cluster.name=cluster1 -E path.data=cluster1_data -E discovery.type=single-node -E http.port=9201 -E transport.port=9301bin/elasticsearch -E node.name=cluster2node -E cluster.name=cluster2 -E path.data=cluster2_data -E discovery.type=single-node -E http.port=9202 -E transport.port=9302//在每個集群上設(shè)置動態(tài)的設(shè)置PUT _cluster/settings{ "persistent": { "cluster": { "remote": { "cluster0": { "seeds": [ "127.0.0.1:9300" ], "transport.ping_schedule": "30s" }, "cluster1": { "seeds": [ "127.0.0.1:9301" ], "transport.compress": true, "skip_unavailable": true }, "cluster2": { "seeds": [ "127.0.0.1:9302" ] } } } }}#cURLcurl -XPUT "http://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'{"persistent":{"cluster":{"remote":{"cluster0":{"seeds":["127.0.0.1:9300"],"transport.ping_schedule":"30s"},"cluster1":{"seeds":["127.0.0.1:9301"],"transport.compress":true,"skip_unavailable":true},"cluster2":{"seeds":["127.0.0.1:9302"]}}}}}'curl -XPUT "http://localhost:9201/_cluster/settings" -H 'Content-Type: application/json' -d'{"persistent":{"cluster":{"remote":{"cluster0":{"seeds":["127.0.0.1:9300"],"transport.ping_schedule":"30s"},"cluster1":{"seeds":["127.0.0.1:9301"],"transport.compress":true,"skip_unavailable":true},"cluster2":{"seeds":["127.0.0.1:9302"]}}}}}'curl -XPUT "http://localhost:9202/_cluster/settings" -H 'Content-Type: application/json' -d'{"persistent":{"cluster":{"remote":{"cluster0":{"seeds":["127.0.0.1:9300"],"transport.ping_schedule":"30s"},"cluster1":{"seeds":["127.0.0.1:9301"],"transport.compress":true,"skip_unavailable":true},"cluster2":{"seeds":["127.0.0.1:9302"]}}}}}'#創(chuàng)建測試數(shù)據(jù)curl -XPOST "http://localhost:9200/users/_doc" -H 'Content-Type: application/json' -d'{"name":"user1","age":10}'curl -XPOST "http://localhost:9201/users/_doc" -H 'Content-Type: application/json' -d'{"name":"user2","age":20}'curl -XPOST "http://localhost:9202/users/_doc" -H 'Content-Type: application/json' -d'{"name":"user3","age":30}'#查詢GET /users,cluster1:users,cluster2:users/_search{ "query": { "range": { "age": { "gte": 20, "lte": 40 } } }}