ElasticSearch
- 分布式的實時文件存儲,每個字段都被索引并可被搜索
- 分布式的實時分析搜索引擎
- 可以擴展到上百臺服務(wù)器,處理PB級結(jié)構(gòu)化或非結(jié)構(gòu)化數(shù)據(jù)
與Elasticserach交互####
- 節(jié)點客戶端(node client)
- 傳輸客戶端(Transport client) 9300端口
基于HTTP協(xié)議呕缭,以JSON為數(shù)據(jù)交互格式的RESTful API####
基本用法
查詢?nèi)?前十個文檔)
GET /megacorp/employee/_search
Elasticsearch的DSL查詢用法####
一些常用的DSL語句####
- 查詢所有索引
GET /_cat/indices?v
- 創(chuàng)建索引
PUT /bookdb_index
{
"settings": {"number_of_shards": 1}
}
- 批量上傳文檔
POST /bookdb_index/book/_bulk
{ "index": { "_id": 1 }}
{ "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary" : "A distibuted real-time search and analytics engine", "publish_date" : "2015-02-07", "num_reviews": 20, "publisher": "oreilly" }
{ "index": { "_id": 2 }}
{ "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": ["grant ingersoll", "thomas morton", "drew farris"], "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "publish_date" : "2013-01-24", "num_reviews": 12, "publisher": "manning" }
{ "index": { "_id": 3 }}
{ "title": "Elasticsearch in Action", "authors": ["radu gheorge", "matthew lee hinman", "roy russo"], "summary" : "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date" : "2015-12-03", "num_reviews": 18, "publisher": "manning" }
{ "index": { "_id": 4 }}
{ "title": "Solr in Action", "authors": ["trey grainger", "timothy potter"], "summary" : "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date" : "2014-04-05", "num_reviews": 23, "publisher": "manning" }
- 基本查詢a(查詢包含guide的記錄)
GET /bookdb_index/book/_search
{
"query": {
"multi_match": {
"query": "in action",
"fields": ["title"]
}
}
}
- 基本查詢b(指定返回字段授嘀,語法高亮等)
在下面例子中赴穗,我們指定 size限定返回的結(jié)果條數(shù)炕吸,from 指定起始位子,_source 指定要返回的字段背亥,以及語法高亮
POST /bookdb_index/book/_search
{
"query": {
"match" : {
"title" : "in action"
}
},
"size": 2,
"from": 0,
"_source": [ "title", "summary", "publish_date" ],
"highlight": {
"fields" : {
"title" : {}
}
}
}
對于多個詞查詢秒际,match 允許指定是否使用 and 操作符來取代默認(rèn)的 or 操作符。你還可以指定 mininum_should_match 選項來調(diào)整返回結(jié)果的相關(guān)程度狡汉。具體看后面的例子娄徊。
- Boosting
由于我們是多個字段查詢,我們可能需要提高某一個字段的分值轴猎。在下面的例子中嵌莉,我們把 summary 字段的分?jǐn)?shù)提高三倍,為了提升 summary 字段的重要度捻脖;因此锐峭,我們把文檔 4 的相關(guān)度提高了。
POST /bookdb_index/book/_search
{
"query": {
"multi_match" : {
"query" : "elasticsearch guide",
"fields": ["title", "summary^3"]
}
},
"_source": ["title", "summary", "publish_date"]
}
- Bool查詢
為了提供更相關(guān)或者特定的結(jié)果可婶,AND/OR/NOT 操作符可以用來調(diào)整我們的查詢沿癞。它是以 布爾查詢 的方式來實現(xiàn)的。布爾查詢 接受如下參數(shù):
a. must 等同于 AND
b. must_not 等同于 NOT
c. should 等同于 OR
POST /bookdb_index/book/_search
{
"query": {
"bool": {
"must": {
"bool" : { "should": [
{ "match": { "title": "Elasticsearch" }},
{ "match": { "title": "Solr" }} ] }
},
"must": { "match": { "authors": "clinton gormely" }},
"must_not": { "match": {"authors": "radu gheorge" }}
}
}
}
- 模糊(Fuzzy)查詢
在進行匹配和多項匹配時矛渴,可以啟用模糊匹配來捕捉拼寫錯誤椎扬,模糊度是基于原始單詞的編輯距離來指定的。
ps: 當(dāng)術(shù)語長度大于 5 個字符時具温,AUTO 的模糊值等同于指定值 “2”蚕涤。但是,80% 拼寫錯誤的編輯距離為 1铣猩,所以揖铜,將模糊值設(shè)置為 1可能會提高您的整體搜索性能。
POST /bookdb_index/book/_search
{
"query": {
"multi_match" : {
"query" : "comprihensiv guide",
"fields": ["title", "summary"],
"fuzziness": "AUTO"
}
},
"_source": ["title", "summary", "publish_date"],
"size": 1
}
- 通配符(wildcard)查詢
通配符查詢 允許你指定匹配的模式达皿,而不是整個術(shù)語天吓。
贿肩? 匹配任何字符
- 匹配零個或多個字符。
例如龄寞,要查找名稱以字母’t’開頭的所有作者的記錄:
POST /bookdb_index/book/_search
{
"query": {
"wildcard" : {
"authors" : "t*"
}
},
"_source": ["title", "authors"],
"highlight": {
"fields" : {
"authors" : {}
}
}
}
- 正則(Regexp)查詢
正則查詢 讓你可以使用比 通配符查詢 更復(fù)雜的模式進行查詢:
POST /bookdb_index/book/_search
{
"query": {
"regexp" : {
"authors" : "t[a-z]*y"
}
},
"_source": ["title", "authors"],
"highlight": {
"fields" : {
"authors" : {}
}
}
}
- 短語匹配(Match Phrase)查詢
短語匹配查詢 要求在請求字符串中的所有查詢項必須都在文檔中存在汰规,文中順序也得和請求字符串一致,且彼此相連物邑。默認(rèn)情況下溜哮,查詢項之間必須緊密相連,但可以設(shè)置 slop 值來指定查詢項之間可以分隔多遠(yuǎn)的距離拂封,結(jié)果仍將被當(dāng)作一次成功的匹配茬射。
POST /bookdb_index/book/_search
{
"query": {
"multi_match" : {
"query": "search engine",
"fields": ["title", "summary"],
"type": "phrase",
"slop": 3
}
},
"_source": [ "title", "summary", "publish_date" ]
}
- 短語前綴(Match Phrase Prefix)查詢
短語前綴式查詢 能夠進行 即時搜索(search-as-you-type) 類型的匹配,或者說提供一個查詢時的初級自動補全功能冒签,無需以任何方式準(zhǔn)備你的數(shù)據(jù)。和 match_phrase 查詢類似钟病,它接收slop 參數(shù)(用來調(diào)整單詞順序和不太嚴(yán)格的相對位置)和 max_expansions參數(shù)(用來限制查詢項的數(shù)量萧恕,降低對資源需求的強度)。
POST /bookdb_index/book/_search
{
"query": {
"match_phrase_prefix" : {
"summary": {
"query": "search en",
"slop": 3,
"max_expansions": 10
}
}
},
"_source": [ "title", "summary", "publish_date" ]
}
注:采用 查詢時即時搜索 具有較大的性能成本肠阱。更好的解決方案是采用 索引時即時搜索票唆。更多信息,請查看 自動補齊接口(Completion Suggester API) 或 邊緣分詞器(Edge-Ngram filters)的用法屹徘。
- 查詢字符串(Query String)
查詢字符串 類型(query_string)的查詢提供了一個方法走趋,用簡潔的簡寫語法來執(zhí)行 多匹配查詢、 布爾查詢 噪伊、 提權(quán)查詢簿煌、 模糊查詢、 通配符查詢鉴吹、 正則查詢 和范圍查詢姨伟。下面的例子中,我們在那些作者是 “grant ingersoll” 或 “tom morton” 的某本書當(dāng)中豆励,使用查詢項 “search algorithm” 進行一次模糊查詢夺荒,搜索全部字段,但給 summary 的權(quán)重提升 2 倍良蒸。
POST /bookdb_index/book/_search
{
"query": {
"query_string" : {
"query": "(saerch~1 algorithm~1) AND (grant ingersoll) OR (tom morton)",
"fields": ["_all", "summary^2"]
}
},
"_source": [ "title", "summary", "authors" ],
"highlight": {
"fields" : {
"summary" : {}
}
}
}
- 簡單查詢字符串(Simple Query String)
簡單請求字符串 類型(simple_query_string)的查詢是請求字符串類型(query_string)查詢的一個版本技扼,它更適合那種僅暴露給用戶一個簡單搜索框的場景;因為它用 +/|/- 分別替換了 AND/OR/NOT嫩痰,并且自動丟棄了請求中無效的部分剿吻,不會在用戶出錯時,拋出異常始赎。
POST /bookdb_index/book/_search
{
"query": {
"simple_query_string" : {
"query": "(saerch~1 algorithm~1) + (grant ingersoll) | (tom morton)",
"fields": ["_all", "summary^2"]
}
},
"_source": [ "title", "summary", "authors" ],
"highlight": {
"fields" : {
"summary" : {}
}
}
}
- 詞條(Term)/多詞條(Terms)查詢
以上例子均為 full-text(全文檢索) 的示例和橙。有時我們對結(jié)構(gòu)化查詢更感興趣仔燕,希望得到更準(zhǔn)確的匹配并返回結(jié)果,詞條查詢 和 多詞條查詢 可幫我們實現(xiàn)魔招。在下面的例子中晰搀,我們要在索引中找到所有由 Manning 出版的圖書。
POST /bookdb_index/book/_search
{
"query": {
"term" : {
"publisher": "manning"
}
},
"_source" : ["title","publish_date","publisher"]
}
-
詞條(Term)查詢 - 排序(Sorted)
詞條查詢 的結(jié)果(和其他查詢結(jié)果一樣)可以被輕易排序办斑,多級排序也被允許:
POST /bookdb_index/book/_search
{
"query": {
"term" : {
"publisher": "manning"
}
},
"_source" : ["title","publish_date","publisher"],
"sort": [
{ "publish_date": {"order":"desc"}},
{ "title": { "order": "desc" }}
]
}
- 范圍查詢
另一個結(jié)構(gòu)化查詢的例子是 范圍查詢外恕。在這個例子中,我們要查找 2015 年出版的書乡翅。
POST /bookdb_index/book/_search
{
"query": {
"range" : {
"publish_date": {
"gte": "2015-01-01",
"lte": "2015-12-31"
}
}
},
"_source" : ["title","publish_date","publisher"]
}
- 過濾(Filtered)查詢
過濾查詢允許你可以過濾查詢結(jié)果鳞疲。對于我們的例子中,要在標(biāo)題或摘要中檢索一些書蠕蚜,查詢項為 Elasticsearch尚洽,但我們又想篩出那些僅有 20 個以上評論的。
POST /bookdb_index/book/_search
{
"query": {
"bool": {
"must" : {
"multi_match": {
"query": "elasticsearch",
"fields": ["title","summary"]
}
},
"filter": {
"range" : {
"num_reviews": {
"gte": 20
}
}
}
}
},
"_source" : ["title","summary","publisher", "num_reviews"]
}